VOCAL AID

Giving voice to those who can't speak, one tap at a time.

AI Partner Catalyst · Google Cloud × ElevenLabs Hackathon

Role — UX/UI Designer & Product Lead
Team — 2 (Designer + Developer)
Platform — Mobile / Android
Tools — Figma · Gemini 2.5 Flash · ElevenLabs · React

A delivery partner with a speech disability received a customer call. Unable to speak, he had no choice but to hand his phone to a stranger. This wasn't a rare edge case, it is a daily reality. And no existing tool was built for it.

VocalAid is an AI-powered assistive app that listens to callers, detects intent, and lets users reply with a single tap — spoken aloud via AI voice. No typing. No speaking.

70M+ people worldwide live with a speech disorder that impacts daily communication.

3 Taps. That's all it takes to complete a Full phone conversation with VocalAid.

Key Insights

Existing tools fail in real-time.

They're built for asynchronous communication — not live, time-pressured phone calls.

Core Constraints we Designed for

Secondary research drew from real incidents, accessibility forums, and delivery worker communities to validate the problem was widespread — not isolated.

Design Principles

Tap Only

All responses via large, tappable buttons. No typing, no voice input required from user.

Predictable Loop

Fixed, repeatable cycle builds muscle memory. No hidden states or ambiguous transitions.

Instant Response

Context-aware options surface immediately. Only intent-relevant choices shown.

Zero Cognitive Load

Fewer choices, larger targets, WCAG AAA contrast on critical actions only.

01

Screen State

Listening

App detects incoming speech automatically. No manual trigger needed. Status always visible.

02

Screen State

Intent Detection

Caller's speech converts to text. Gemini AI identifies what they're asking and surfaces options.

03

Screen State

Respond

2–3 large tap responses appear. One tap sends the reply as spoken audio via ElevenLabs.

Why it works?

No typing. No speaking. No confusion.
Predictable loop anyone can learn in one call.
AI voice output sounds natural — not robotic.
System states are always transparent: Listening / Thinking / Speaking.

Under the hood.

Gemini 2.5 Flash

Interprets caller intent in real time.

ElevenLabs

Natural AI voice speaks the selected reply

React + Google Cloud

Mobile-first frontend, cloud infrastructure

WCAG AAA contrast on every critical action. 48px touch targets, meeting Android's own accessibility guidelines.

Hi-Fi Designs

What I'd Do Differently

Real user testing.

Even one informal session with someone who has a speech disability would have sharpened the response options and interaction flow significantly.

What's Next

Built to scale.

The interaction model expands into other professions, other languages, and eventually into Android's native accessibility framework as a system-level layer.