UI Kit Mockup Converter — Architecture & Design Decisions

A Google Chat app that accepts an image of a UI mockup, uses Gemini to generate valid Google Chat Cards v2 JSON, renders the card back to the user, and supports natural-language refinements against the previous result. Self-corrects when its own output fails to render.

This doc covers how it's built and why each choice was made, so the same app can be rebuilt internally. An appendix at the end captures specific gotchas encountered during implementation.

🏗️ Architecture

High-level shape

Four concerns, handled by separate components:

Sync handler — accepts Chat events, returns within Chat's 30-second window
Async worker — does the slow work (image download, LLM call, card rendering) without timeout pressure
Persistent state — per-space conversation memory so refinements build on previous results
LLM — Gemini for both the core conversion and the supporting "personality" text

The sync and async pieces are the most important architectural call. Google Chat demands a response within 30 seconds or the user sees a "not responding" error — and LLM calls on dense mockups can push 20-30s on their own, before you add attachment download, JSON parsing, and follow-up posting. Trying to do everything synchronously will fail intermittently, and the failures are invisible from the app's side (the function may complete successfully but Chat has already given up).

Going async fixed that permanently. The sync handler returns an immediate "Working on it..." ack within 1-2 seconds, enqueues a Cloud Tasks task, and exits. The worker picks up the task, does the LLM work with no timeout pressure, and posts the final rendered card and JSON as separate Chat messages via the Chat API.

Concrete stack

RuntimeCloud Run Function Gen2, Python 3.12, us-central1
Entry pointSingle function, URL-routed — POST / handler, POST /worker worker
QueueCloud Tasks, mockup-conversions queue, HTTP target with OIDC auth
StateFirestore Native mode, default DB, us-central1, one doc per space
LLMVertex AI · Gemini 2.5 Pro for cards, Gemini 2.5 Flash for personality
Function config--timeout=120s --cpu=1 --memory=512Mi --min-instances=1

⚡ Single biggest latency lever

The --min-instances=1 flag. Without it, every first request after idle had a 10-15s cold start penalty — which wiped out any latency budget before the LLM even ran. With it on, the container is always warm.

Request lifecycle

Fresh mockup upload

Chat → POST / (handler) with messagePayload containing an image attachment reference
Handler generates five voice-consistent UI strings via a single Flash call (~1-2s): the ack, an "extended ack" for long-running work, an "updated" badge, a "json intro" line, and a self-correction message
Handler enqueues a Cloud Task with the original event payload + the generated personality strings
Handler returns the ack string as the sync response to Chat
Cloud Tasks dispatches the task to POST /worker ~500ms later
Worker extracts personality, starts a background thread running: download attachment via Chat media API → call Gemini Pro with the attachment bytes + a large system-instruction prompt containing the full Cards v2 reference doc
Timer watches for the 10-second threshold. If Gemini hasn't returned by then, worker posts the "extended ack" ("Still working on it...") as a second message
Gemini returns JSON. Worker attempts to post the rendered card via the Chat API
If Chat rejects the JSON (hallucinated field, wrong widget type, etc.), worker posts the self-correction message, sends the broken JSON + Chat's specific error message back to Gemini with a "fix this" prompt, and retries. One retry only, then gives up with a friendly message
Worker posts the final JSON as a second message with a collapsible code block
Worker writes the final card JSON to Firestore, keyed by space resource name, timestamped

Refinement (text-only @mention with prior state)

Same flow, but the worker calls Gemini with the previous JSON + accumulated refinement instructions + the new instruction, rather than with an image. Each refinement appends to the history. History is stored as a list of instruction strings.

Memory reset: Every new image upload clears state for that space before writing fresh.

State schema

Firestore collection conversations, one document per space (doc ID is the sanitized space resource name, e.g. spaces_AAAAbcdef):

TTL is 6 hours, enforced at read time (document older than 6 hours returns None). Documents aren't actively deleted — they're just ignored. At higher volume we'd want a server-side TTL policy.

The reference doc

The largest single artifact in the project is a ~22KB hand-curated Cards v2 reference document, injected as a system instruction on every LLM call. It covers response envelopes, card structure, every widget type (including less-documented ones like decoratedText variants with wrapping vs. truncation, collapsible sections with uncollapsibleWidgetsCount, column hard-limits, carousels, chipLists), the complete 28-item knownIcon enum, HTML formatting rules, and a list of specific gotchas observed during development.

This is doing the heavy lifting. Gemini's training data on Cards v2 is incomplete and partially wrong — it invents field names like collapsibleGroup that don't exist. With the reference doc in context, hallucinations drop dramatically. Without it, every third card fails to render.

📋 Port this first

The reference doc is probably the most valuable single artifact to carry over to any reimplementation. It was compiled from a mix of public Google docs, internal team knowledge, trial and error, and specific corrections about widget behavior contributed during development.

The "personality" system

All user-facing bot messages that aren't error messages or instructions are generated once per request by a single Flash call that returns a JSON object with five voice-consistent strings.

The prompt defines the voice ("warm with slight dry wit, not saccharine, not wacky") and enforces guardrails (no comments on the mockup itself, no fake urgency, vary openers, no emoji spam). Temperature is 0.9 for variety.

Every message builder accepts both an override string and a fallback. If the personality call fails or returns invalid JSON, hardcoded defaults kick in. The user never knows if the personality call failed.

This was a decision made deliberately: users uploading 10 mockups in a row should not see the same "Working on it..." 10 times. Designers especially notice stale copy. One generation call covers an entire request lifecycle, including the long-tail extended-ack and self-correction strings that might never fire — cheap insurance for tonal consistency when they do.

⚖️ Key design decisions

01Async via Cloud Tasks, not threading or streaming

Chat has a 30-second sync response ceiling. Options considered:

Stay sync, optimize hard: Explored. Got to 15-25s with Flash + warm container, but failures were frequent because anything above ~28s blows the limit and there's no way to know from the function side that it happened.
Background thread from handler: Not reliable in Cloud Run. Container can be reaped after the HTTP response is sent.
Streaming HTTP response: Chat's ingestion of responses doesn't support streaming in a useful way.
Separate Pub/Sub or Cloud Tasks: Clean architectural boundary. Chose Cloud Tasks because it's explicitly designed for this pattern (HTTP-target tasks with OIDC auth) and doesn't require a message consumer separate from the main function.

Same-function routing (handler and worker in one deploy, different URL paths) avoided the complexity of maintaining two separate functions with duplicated deps. One codebase, one image, one deploy cycle.

02Gemini Pro for card generation, Flash for personality

Pro was the original choice. We briefly switched to Flash when fighting the 30s sync ceiling — Flash shaved ~5s off — and the quality drop was visible. Flash hallucinates widget types more often (which is how the self-correction loop got exercised), flattens columns that should be preserved, and misses details like "this section was visibly collapsible in the mockup."

Once async removed the 30s pressure, the 5-second savings didn't matter, so we went back to Pro. Latency went up a bit, quality went up a lot.

For the personality system, Flash is genuinely fine — it's generating 5 short strings with a tight voice definition, not structured output against a complex schema. The speed is nice and we don't care about the quality difference on this task.

03Per-space state, not per-user or per-thread

The natural options:

Per-thread: Cleanest isolation, but requires users to reply in-thread. Inline threading in Chat isn't used heavily enough to build a product assumption on.
Per-user-in-space: Good for group spaces where multiple people work independently. More complicated to reason about.
Per-space: Simplest. Works perfectly for DMs (the primary use case). In group spaces, two simultaneous users clobber each other's state, but this is a POC — group usage is not the target path.

Chose per-space. If we wanted to promote this to production, per-thread-with-fallback-to-per-space would be the right refinement.

04Self-correction loop

Even with the reference doc, Gemini occasionally invents widget types (collapsibleGroup, which doesn't exist, was a real case) or structures things wrong. Chat's response when rejecting a card is specific and useful: "Unknown name 'collapsibleGroup' at 'message.cards_v2[0].card.sections[1].widgets[1]': Cannot find field." That's a clear enough signal that another Gemini call with [original JSON] + [this error] + "fix it" resolves the majority of these cases.

So the worker does this: try the post, and if it fails, surface a friendly "oops" message to the user, run the fix prompt, and retry. One retry only to prevent loops. If the fix also fails, we give up gracefully with a message telling the user to rephrase.

The important UX detail: the user sees this as "the bot tried, caught itself, fixed it" rather than as a failure.

05Separate JSON message after the rendered card

The user-visible result is two cards: rendered card first, then a JSON card below. Google Chat doesn't allow interleaving text and cards within a single message, so the worker posts two separate messages:

Rendered card (with "🔄 Updated" badge if refinement)
"And the JSON:" text + JSON card (collapsible section containing a <pre><code> block of the raw JSON)

This requires Chat API auth to post messages — the first message can't just be the sync response, because the sync response arrives last chronologically (it waits for the function to finish, while the Chat API post happens mid-execution). So both messages are posted by the worker via the Chat API.

06Reference doc included in refinement prompts

Initially we dropped the reference doc on refinement calls to save tokens/latency, reasoning that the previous JSON serves as its own schema example. This caused the collapsibleGroup hallucination. Refinement prompts now include the full reference doc, same as fresh conversions.

The lesson: "use the previous JSON as a schema reference" is not as strong an instruction as "here's the actual schema." Don't trust the LLM to generalize from examples when you have the spec available.

07Single voice-generation call per request, not per-message

Tempting to generate each personality string on demand — generate the self-correction message only when we actually hit a failure, for example. But that introduces two problems: (a) additional latency at the failure moment (exactly when users already feel something is wrong), and (b) tonal drift between messages because each generation call is independent. Generating all five strings up front in one call costs ~1s and keeps them internally consistent.

📦 Built vs. deferred

Built and working

image → card generation refinement loop per-space state · 6h TTL two-message response 🔄 updated badge async via Cloud Tasks self-correction on render failure AI-generated personality system "still working..." at 10s /help command welcome message (DM + space variants) fallback defaults

Deferred

global error-learning loop /show-learnings admin cmd /reset-learnings admin cmd

Self-correction handled the common cases in the moment, so the global error-learning loop was punted until real usage data shows what recurs.

Appendix · 01

Implementation Gotchas

Specific issues encountered during implementation. Some may be unique to the external GCP setup; others are generic to building Chat apps and will show up anywhere. Documented here so they don't have to be rediscovered.

Chat app auth

Chat calling us vs. us calling Chat are different auth surfaces.

For Chat to invoke the function (inbound):

chat@system.gserviceaccount.com needs the run.invoker role on the function
service-<project_number>@gcp-sa-gsuiteaddons.iam.gserviceaccount.com also needs run.invoker (this is the actual caller for the Workspace Add-on event format, which is what modern Chat apps use)

For the function to post messages back to Chat (outbound):

The runtime service account needs to request the chat.bot OAuth scope when building the Chat API client
Use default credentials with explicit scope: google.auth.default(scopes=["https://www.googleapis.com/auth/chat.bot"])

Neither direction is well-documented. Both are needed. The inbound permissions especially are easy to miss because the failure mode is just "nothing happens" — Chat silently fails to deliver the event and there's no log on the app's side.

Event payload shapes

Two event formats exist: legacy Chat bot and Workspace Add-on. Modern apps are expected to use the Add-on format. Key payload paths:

Workspace Add-on format:

event.chat.messagePayload.message — user message with attachment[], argumentText, etc.
event.chat.addedToSpacePayload — on install
event.chat.appCommandPayload — on slash command
Response envelope: {"hostAppDataAction": {"chatDataAction": {"createMessageAction": {"message": {...}}}}}

The legacy format has a different shape (event.type == "MESSAGE", etc.). Code should handle both defensively in case of edge cases or format fallbacks.

⚠️ Gotcha

Slash command IDs come back as integers, not strings. The field appCommandPayload.appCommandMetadata.appCommandId returns an int like 1, not a string "1". Compare as int, not string. This one took a round-trip to debug because the code "looked right."

Attachment handling

Downloading attachments requires the Chat media API, and its HTTP transport has no timeout by default. If the download hangs, it hangs forever — until the function's wall-clock kill.

Fix: wrap the httplib2.Http client used by the Chat API library with an explicit timeout:

http = google_auth_httplib2.AuthorizedHttp(
    credentials, http=httplib2.Http(timeout=20)
)
service = build("chat", "v1", http=http, cache_discovery=False)

You'll see a warning in logs saying httplib2 transport does not support per-request timeout when you use MediaIoBaseDownload — this warning is about the per-request timeout specifically, but the transport-level timeout set above still applies. Warning is benign; it's not the cause of any hangs.

Function latency traps

Default function CPU/memory is underpowered for LLM SDKs. Out-of-the-box, Cloud Run Functions Gen2 gets cpu: 0.1666, memory: 256Mi. The Gemini SDK alone takes significant CPU to initialize; requests can spend 10+ seconds just in library overhead. Bumping to --cpu=1 --memory=512Mi cut per-request latency roughly in half.
Cold starts add ~10-15 seconds. --min-instances=1 eliminates this. Without it, the first request after ~15 minutes idle will cold-start and blow any latency budget.
Firestore client initialization is slow on first request. Expect ~3-5 seconds the first time firestore.Client() is called in a container. Initialize lazily and cache the client at module level.

Cards v2 schema quirks

A running list of widget/behavior specifics worth remembering:

decoratedText.bottomLabel truncates; bottomLabelText wraps. Not documented clearly. For wrapping labels, use the ...Text variants.
Columns are hard-limited to 2. For 3+ column layouts, use grid with columnCount.
Dividers inside a section are inset. Dividers between sections are edge-to-edge and automatic — do not add a divider widget between sections, it'll double up.
textParagraph supports maxLines for expandable "Show more" behavior. Not in the public docs prominently.
Collapsibility is a section property, not a widget. Use collapsible: true and uncollapsibleWidgetsCount: N on the section object. There is no collapsibleGroup widget, no matter what the LLM tries.
fixedFooter only works in dialogs, not in cards sent to spaces. Chat may silently ignore it or reject the whole card.
Cards cap at 100 widgets total. Chat silently drops anything past that.
Valid knownIcon enum is ~28 entries. LLMs invent new ones; always validate against the list.

Org policy issues (probably external-only)

These bit us externally and may not apply internally, but noting them in case:

iam.automaticIamGrantsForDefaultServiceAccounts — when on, silently reverts IAM grants to default service accounts. Needed to be overridden at project level.
iam.allowedPolicyMemberDomains — when restrictive, blocks granting roles to service accounts from other domains (like chat@system.gserviceaccount.com). Needed project-level override to "Allow all."
The user applying these overrides needs roles/orgpolicy.policyAdmin at the org level, which was not discoverable in the Cloud Console UI — had to be granted via gcloud.

The signal that this is biting you: you grant an IAM role, get-iam-policy shows it as not present afterward, and you're confused why. That's silent reversion from an org policy.

Service account permissions required

The function's runtime service account ended up needing all of these. Each was discovered by hitting a failure and looking at the error:

aiplatform.userCall Gemini via Vertex AI
cloudbuild.builds.builderFunction deploy
cloudtasks.enqueuerEnqueue tasks
datastore.userRead/write Firestore
run.invokerCloud Tasks invoking the worker endpoint via OIDC
iam.serviceAccountUserGranted on the service account itself, so it can "act as" itself to sign OIDC tokens. Non-obvious — the error is "principal lacks iam.serviceAccounts.actAs" but the fix is a binding on the service account, not on the project.

Distribution wall

The Visibility setting in the Chat API config accepts only emails in your project's own domain. An app hosted in one Workspace domain cannot be installed by users in another Workspace domain without either (a) admin-level allowlisting at the target domain or (b) publishing to the Google Workspace Marketplace.

This is enforced in the Cloud Console UI itself — you cannot add cross-domain emails even if you want to.

This is the hard wall that prevents external distribution to Google Workspace users. Internal hosting avoids it.

Appendix · 02

Travis's Direction & Product Calls

Product decisions and directional calls Travis made during the build. The implementation was done with Claude as a paired engineering collaborator; the product shape — what got built, what got cut, and how it should feel — was directed by Travis throughout.

Product scope & direction

The original spec. Defined the end-to-end concept: @mention + image returns rendered card + JSON, @mention with feedback refines the card, error-learning loop, help command.
Choosing Gemini as the LLM. Wanted to stay in the Google ecosystem.
Reframing the refinement feature. When Claude initially recommended deferring the refinement loop in favor of improving first-shot accuracy, Travis pushed back:
User mocks aren't perfect and without that perfection it's hard for AI to get it right. Until we have perfect mocks the user needs to be able to give that feedback. And if we have perfect mocks we don't need this app.
This reframed refinement from "fix" to "core loop" and changed the product shape.
Deciding to build async correctly, not minimally. When weighing a small async patch against true async via Cloud Tasks, Travis said:
What's the right change, not the smallest change.
This committed the build to Cloud Tasks and removed the 30s ceiling permanently.
Calling the launch plan off. After hitting the distribution wall, Travis chose to pause the Marketplace push in favor of an internal conversation with his manager rather than spending 2-4 weeks on public Marketplace publishing for what might be better served by internal hosting.

Specific product/UX calls

Memory scope and behavior. Per-space (not per-thread or per-user). 6-hour TTL — originally 24h was proposed, Travis preferred shorter. New image uploads reset memory. Text-only @mention with no prior card returns a specific error message with Travis's wording: "I don't know which mock you're talking about. Upload a mock and I'll convert it."
Two-message response pattern. When Claude proposed one message with two cards ("cleaner, no auth setup"), Travis held firm on two separate messages with a text line between them: "Option B is what I'm wanting."
"And the JSON:" copy. Travis specified the exact text between the rendered card and the JSON card after seeing the first version.
"🔄 Updated" indicator. Travis's idea for signaling refinements visually.
Differentiated ack messages. "When it does an iteration let's change the reply message from 'working on your mockup...' to 'iterating on your card...'" — caught the missing nuance and asked for it.
Personality as a system, not a feature. When Claude scoped the playful ack line as a one-off fast-follow, Travis reframed it:
Optimally this would apply to all the apps replies to make it a more enjoyable experience.
This turned a single AI-generated string into a coherent voice system across five message types.
Voice direction. "Fun and light with the slightest bit of sarcasm." Also caught the repetition problem ("a LOT of the messages start with 'alright,'") which led to the explicit varied-openers guardrail.
Pick which messages get personality and which don't. Travis called the scope: personality on the ack, updated badge, json intro, and self-correction; hardcoded/clear language on errors, instructions, and help. "The reported messages that aren't critical."
"Still working..." follow-up threshold. Proposed the 10-second timer idea and confirmed the threshold.
Middle-path call on three-stage updates. Claude initially recommended true in-place message updates. Travis weighed the latency tradeoff and chose the middle path: keep instant sync ack, post "still working" as a new message at 10s. "I'm ok with not doing update in place. But I would like the 'still working on it' follow up."
Self-correction UX phrasing.
If the app knows it made an error, instead of just returning an error message can it say "oops, that card didn't work right. Working on a fix..."?
Reframed the failure message from informational to voiced and active.
Going back to Gemini Pro. Once async removed the latency constraint, Travis asked: "Would going back to Gemini 2.5 pro possibly fix this?" and approved the switch back to Pro for better card generation quality.
Welcome message copy and structure. Wrote the welcome message content, including the bullet list structure and the space-vs-DM variants. Added the request for emojis per bullet for visual rhythm.
Help and welcome as one thing. "I think the help message and welcome message might be the same thing." Simpler and truer to how the user experiences it.

Cards v2 domain expertise

Travis is the owner of the external Chat UI Kit documentation, and contributed corrections and additions to the reference doc that materially improved card-generation quality:

textParagraph.maxLines for expandable "Show more" behavior — not in prominent public docs, Travis flagged it.
decoratedText field variants — bottomLabelText / contentText wrap, while bottomLabel / text truncate. Important distinction for correct rendering that Gemini would not otherwise know.
Divider behavior. "Divider inside a section is inset; divider between sections is edge-to-edge (achieved automatically — no divider widget needed between sections)." This prevented the LLM from inserting duplicate dividers.
Columns hard-limited to 2; use grid for 3+. Caught the LLM's tendency to invent 3-column layouts.
General review of the reference doc for accuracy against his domain knowledge.

Pushbacks and corrections to Claude's approach

Corrected scope on the refinement feature — from nice-to-have to core.
Corrected scope on the personality system — from one-off to systemic.
Pushed for the real architectural change rather than the minimal patch when addressing the 30s timeout.
Rejected a proposed "clickable link to the DM" in the space-install welcome message when Claude surfaced that cross-user DM links don't work reliably. Accepted the "check your Chat sidebar" fallback instead.
Rejected excess scope on the help card. Claude proposed a richly-formatted card with images; Travis specified: "I don't think we need an image, just text."
Rejected the cost/time framing on an earlier draft of this document. The doc is about how it was built, not about selling the idea.

Testing & debugging direction

Throughout the build, Travis ran tests after each deploy, captured logs when things failed, and interpreted the UX of what actually landed in Chat (e.g. noticing the JSON message rendered before the card when messages arrived out of order, catching when the AI response started every message with "Alright,"). Most of the iteration cycles were driven by specific product observations Travis made in real use, not by theoretical problems Claude anticipated.