A Google Chat app that accepts an image of a UI mockup, uses Gemini to generate valid Google Chat Cards v2 JSON, renders the card back to the user, and supports natural-language refinements against the previous result. Self-corrects when its own output fails to render.
This doc covers how it's built and why each choice was made, so the same app can be rebuilt internally. An appendix at the end captures specific gotchas encountered during implementation.
🏗️ Architecture
High-level shape
Four concerns, handled by separate components:
- Sync handler — accepts Chat events, returns within Chat's 30-second window
- Async worker — does the slow work (image download, LLM call, card rendering) without timeout pressure
- Persistent state — per-space conversation memory so refinements build on previous results
- LLM — Gemini for both the core conversion and the supporting "personality" text
The sync and async pieces are the most important architectural call. Google Chat demands a response within 30 seconds or the user sees a "not responding" error — and LLM calls on dense mockups can push 20-30s on their own, before you add attachment download, JSON parsing, and follow-up posting. Trying to do everything synchronously will fail intermittently, and the failures are invisible from the app's side (the function may complete successfully but Chat has already given up).
Going async fixed that permanently. The sync handler returns an immediate "Working on it..." ack within 1-2 seconds, enqueues a Cloud Tasks task, and exits. The worker picks up the task, does the LLM work with no timeout pressure, and posts the final rendered card and JSON as separate Chat messages via the Chat API.
Concrete stack
- RuntimeCloud Run Function Gen2, Python 3.12,
us-central1 - Entry pointSingle function, URL-routed —
POST /handler,POST /workerworker - QueueCloud Tasks,
mockup-conversionsqueue, HTTP target with OIDC auth - StateFirestore Native mode, default DB,
us-central1, one doc per space - LLMVertex AI · Gemini 2.5 Pro for cards, Gemini 2.5 Flash for personality
- Function config
--timeout=120s --cpu=1 --memory=512Mi --min-instances=1
The --min-instances=1 flag. Without it, every first request after idle had a 10-15s cold start penalty — which wiped out any latency budget before the LLM even ran. With it on, the container is always warm.
Request lifecycle
Fresh mockup upload
- Chat →
POST /(handler) withmessagePayloadcontaining an image attachment reference - Handler generates five voice-consistent UI strings via a single Flash call (~1-2s): the ack, an "extended ack" for long-running work, an "updated" badge, a "json intro" line, and a self-correction message
- Handler enqueues a Cloud Task with the original event payload + the generated personality strings
- Handler returns the ack string as the sync response to Chat
- Cloud Tasks dispatches the task to
POST /worker~500ms later - Worker extracts personality, starts a background thread running: download attachment via Chat media API → call Gemini Pro with the attachment bytes + a large system-instruction prompt containing the full Cards v2 reference doc
- Timer watches for the 10-second threshold. If Gemini hasn't returned by then, worker posts the "extended ack" ("Still working on it...") as a second message
- Gemini returns JSON. Worker attempts to post the rendered card via the Chat API
- If Chat rejects the JSON (hallucinated field, wrong widget type, etc.), worker posts the self-correction message, sends the broken JSON + Chat's specific error message back to Gemini with a "fix this" prompt, and retries. One retry only, then gives up with a friendly message
- Worker posts the final JSON as a second message with a collapsible code block
- Worker writes the final card JSON to Firestore, keyed by space resource name, timestamped
Refinement (text-only @mention with prior state)
Same flow, but the worker calls Gemini with the previous JSON + accumulated refinement instructions + the new instruction, rather than with an image. Each refinement appends to the history. History is stored as a list of instruction strings.
Memory reset: Every new image upload clears state for that space before writing fresh.
State schema
Firestore collection conversations, one document per space (doc ID is the sanitized space resource name, e.g. spaces_AAAAbcdef):
TTL is 6 hours, enforced at read time (document older than 6 hours returns None). Documents aren't actively deleted — they're just ignored. At higher volume we'd want a server-side TTL policy.
The reference doc
The largest single artifact in the project is a ~22KB hand-curated Cards v2 reference document, injected as a system instruction on every LLM call. It covers response envelopes, card structure, every widget type (including less-documented ones like decoratedText variants with wrapping vs. truncation, collapsible sections with uncollapsibleWidgetsCount, column hard-limits, carousels, chipLists), the complete 28-item knownIcon enum, HTML formatting rules, and a list of specific gotchas observed during development.
This is doing the heavy lifting. Gemini's training data on Cards v2 is incomplete and partially wrong — it invents field names like collapsibleGroup that don't exist. With the reference doc in context, hallucinations drop dramatically. Without it, every third card fails to render.
The reference doc is probably the most valuable single artifact to carry over to any reimplementation. It was compiled from a mix of public Google docs, internal team knowledge, trial and error, and specific corrections about widget behavior contributed during development.
The "personality" system
All user-facing bot messages that aren't error messages or instructions are generated once per request by a single Flash call that returns a JSON object with five voice-consistent strings.
The prompt defines the voice ("warm with slight dry wit, not saccharine, not wacky") and enforces guardrails (no comments on the mockup itself, no fake urgency, vary openers, no emoji spam). Temperature is 0.9 for variety.
Every message builder accepts both an override string and a fallback. If the personality call fails or returns invalid JSON, hardcoded defaults kick in. The user never knows if the personality call failed.
This was a decision made deliberately: users uploading 10 mockups in a row should not see the same "Working on it..." 10 times. Designers especially notice stale copy. One generation call covers an entire request lifecycle, including the long-tail extended-ack and self-correction strings that might never fire — cheap insurance for tonal consistency when they do.
⚖️ Key design decisions
01Async via Cloud Tasks, not threading or streaming
Chat has a 30-second sync response ceiling. Options considered:
- Stay sync, optimize hard: Explored. Got to 15-25s with Flash + warm container, but failures were frequent because anything above ~28s blows the limit and there's no way to know from the function side that it happened.
- Background thread from handler: Not reliable in Cloud Run. Container can be reaped after the HTTP response is sent.
- Streaming HTTP response: Chat's ingestion of responses doesn't support streaming in a useful way.
- Separate Pub/Sub or Cloud Tasks: Clean architectural boundary. Chose Cloud Tasks because it's explicitly designed for this pattern (HTTP-target tasks with OIDC auth) and doesn't require a message consumer separate from the main function.
Same-function routing (handler and worker in one deploy, different URL paths) avoided the complexity of maintaining two separate functions with duplicated deps. One codebase, one image, one deploy cycle.
02Gemini Pro for card generation, Flash for personality
Pro was the original choice. We briefly switched to Flash when fighting the 30s sync ceiling — Flash shaved ~5s off — and the quality drop was visible. Flash hallucinates widget types more often (which is how the self-correction loop got exercised), flattens columns that should be preserved, and misses details like "this section was visibly collapsible in the mockup."
Once async removed the 30s pressure, the 5-second savings didn't matter, so we went back to Pro. Latency went up a bit, quality went up a lot.
For the personality system, Flash is genuinely fine — it's generating 5 short strings with a tight voice definition, not structured output against a complex schema. The speed is nice and we don't care about the quality difference on this task.
03Per-space state, not per-user or per-thread
The natural options:
- Per-thread: Cleanest isolation, but requires users to reply in-thread. Inline threading in Chat isn't used heavily enough to build a product assumption on.
- Per-user-in-space: Good for group spaces where multiple people work independently. More complicated to reason about.
- Per-space: Simplest. Works perfectly for DMs (the primary use case). In group spaces, two simultaneous users clobber each other's state, but this is a POC — group usage is not the target path.
Chose per-space. If we wanted to promote this to production, per-thread-with-fallback-to-per-space would be the right refinement.
04Self-correction loop
Even with the reference doc, Gemini occasionally invents widget types (collapsibleGroup, which doesn't exist, was a real case) or structures things wrong. Chat's response when rejecting a card is specific and useful: "Unknown name 'collapsibleGroup' at 'message.cards_v2[0].card.sections[1].widgets[1]': Cannot find field." That's a clear enough signal that another Gemini call with [original JSON] + [this error] + "fix it" resolves the majority of these cases.
So the worker does this: try the post, and if it fails, surface a friendly "oops" message to the user, run the fix prompt, and retry. One retry only to prevent loops. If the fix also fails, we give up gracefully with a message telling the user to rephrase.
The important UX detail: the user sees this as "the bot tried, caught itself, fixed it" rather than as a failure.
05Separate JSON message after the rendered card
The user-visible result is two cards: rendered card first, then a JSON card below. Google Chat doesn't allow interleaving text and cards within a single message, so the worker posts two separate messages:
- Rendered card (with "🔄 Updated" badge if refinement)
- "And the JSON:" text + JSON card (collapsible section containing a
<pre><code>block of the raw JSON)
This requires Chat API auth to post messages — the first message can't just be the sync response, because the sync response arrives last chronologically (it waits for the function to finish, while the Chat API post happens mid-execution). So both messages are posted by the worker via the Chat API.
06Reference doc included in refinement prompts
Initially we dropped the reference doc on refinement calls to save tokens/latency, reasoning that the previous JSON serves as its own schema example. This caused the collapsibleGroup hallucination. Refinement prompts now include the full reference doc, same as fresh conversions.
The lesson: "use the previous JSON as a schema reference" is not as strong an instruction as "here's the actual schema." Don't trust the LLM to generalize from examples when you have the spec available.
07Single voice-generation call per request, not per-message
Tempting to generate each personality string on demand — generate the self-correction message only when we actually hit a failure, for example. But that introduces two problems: (a) additional latency at the failure moment (exactly when users already feel something is wrong), and (b) tonal drift between messages because each generation call is independent. Generating all five strings up front in one call costs ~1s and keeps them internally consistent.
📦 Built vs. deferred
Built and working
Deferred
Self-correction handled the common cases in the moment, so the global error-learning loop was punted until real usage data shows what recurs.
Implementation Gotchas
Specific issues encountered during implementation. Some may be unique to the external GCP setup; others are generic to building Chat apps and will show up anywhere. Documented here so they don't have to be rediscovered.
Chat app auth
Chat calling us vs. us calling Chat are different auth surfaces.
For Chat to invoke the function (inbound):
chat@system.gserviceaccount.comneeds therun.invokerrole on the functionservice-<project_number>@gcp-sa-gsuiteaddons.iam.gserviceaccount.comalso needsrun.invoker(this is the actual caller for the Workspace Add-on event format, which is what modern Chat apps use)
For the function to post messages back to Chat (outbound):
- The runtime service account needs to request the
chat.botOAuth scope when building the Chat API client - Use default credentials with explicit scope:
google.auth.default(scopes=["https://www.googleapis.com/auth/chat.bot"])
Neither direction is well-documented. Both are needed. The inbound permissions especially are easy to miss because the failure mode is just "nothing happens" — Chat silently fails to deliver the event and there's no log on the app's side.
Event payload shapes
Two event formats exist: legacy Chat bot and Workspace Add-on. Modern apps are expected to use the Add-on format. Key payload paths:
Workspace Add-on format:
event.chat.messagePayload.message— user message withattachment[],argumentText, etc.event.chat.addedToSpacePayload— on installevent.chat.appCommandPayload— on slash command- Response envelope:
{"hostAppDataAction": {"chatDataAction": {"createMessageAction": {"message": {...}}}}}
The legacy format has a different shape (event.type == "MESSAGE", etc.). Code should handle both defensively in case of edge cases or format fallbacks.
Slash command IDs come back as integers, not strings. The field appCommandPayload.appCommandMetadata.appCommandId returns an int like 1, not a string "1". Compare as int, not string. This one took a round-trip to debug because the code "looked right."
Attachment handling
Downloading attachments requires the Chat media API, and its HTTP transport has no timeout by default. If the download hangs, it hangs forever — until the function's wall-clock kill.
Fix: wrap the httplib2.Http client used by the Chat API library with an explicit timeout:
http = google_auth_httplib2.AuthorizedHttp(
credentials, http=httplib2.Http(timeout=20)
)
service = build("chat", "v1", http=http, cache_discovery=False)
You'll see a warning in logs saying httplib2 transport does not support per-request timeout when you use MediaIoBaseDownload — this warning is about the per-request timeout specifically, but the transport-level timeout set above still applies. Warning is benign; it's not the cause of any hangs.
Function latency traps
- Default function CPU/memory is underpowered for LLM SDKs. Out-of-the-box, Cloud Run Functions Gen2 gets
cpu: 0.1666, memory: 256Mi. The Gemini SDK alone takes significant CPU to initialize; requests can spend 10+ seconds just in library overhead. Bumping to--cpu=1 --memory=512Micut per-request latency roughly in half. - Cold starts add ~10-15 seconds.
--min-instances=1eliminates this. Without it, the first request after ~15 minutes idle will cold-start and blow any latency budget. - Firestore client initialization is slow on first request. Expect ~3-5 seconds the first time
firestore.Client()is called in a container. Initialize lazily and cache the client at module level.
Cards v2 schema quirks
A running list of widget/behavior specifics worth remembering:
decoratedText.bottomLabeltruncates;bottomLabelTextwraps. Not documented clearly. For wrapping labels, use the...Textvariants.- Columns are hard-limited to 2. For 3+ column layouts, use
gridwithcolumnCount. - Dividers inside a section are inset. Dividers between sections are edge-to-edge and automatic — do not add a divider widget between sections, it'll double up.
textParagraphsupportsmaxLinesfor expandable "Show more" behavior. Not in the public docs prominently.- Collapsibility is a section property, not a widget. Use
collapsible: trueanduncollapsibleWidgetsCount: Non the section object. There is nocollapsibleGroupwidget, no matter what the LLM tries. fixedFooteronly works in dialogs, not in cards sent to spaces. Chat may silently ignore it or reject the whole card.- Cards cap at 100 widgets total. Chat silently drops anything past that.
- Valid
knownIconenum is ~28 entries. LLMs invent new ones; always validate against the list.
Org policy issues (probably external-only)
These bit us externally and may not apply internally, but noting them in case:
iam.automaticIamGrantsForDefaultServiceAccounts— when on, silently reverts IAM grants to default service accounts. Needed to be overridden at project level.iam.allowedPolicyMemberDomains— when restrictive, blocks granting roles to service accounts from other domains (likechat@system.gserviceaccount.com). Needed project-level override to "Allow all."- The user applying these overrides needs
roles/orgpolicy.policyAdminat the org level, which was not discoverable in the Cloud Console UI — had to be granted viagcloud.
The signal that this is biting you: you grant an IAM role, get-iam-policy shows it as not present afterward, and you're confused why. That's silent reversion from an org policy.
Service account permissions required
The function's runtime service account ended up needing all of these. Each was discovered by hitting a failure and looking at the error:
- aiplatform.userCall Gemini via Vertex AI
- cloudbuild.builds.builderFunction deploy
- cloudtasks.enqueuerEnqueue tasks
- datastore.userRead/write Firestore
- run.invokerCloud Tasks invoking the worker endpoint via OIDC
- iam.serviceAccountUserGranted on the service account itself, so it can "act as" itself to sign OIDC tokens. Non-obvious — the error is "principal lacks
iam.serviceAccounts.actAs" but the fix is a binding on the service account, not on the project.
Distribution wall
The Visibility setting in the Chat API config accepts only emails in your project's own domain. An app hosted in one Workspace domain cannot be installed by users in another Workspace domain without either (a) admin-level allowlisting at the target domain or (b) publishing to the Google Workspace Marketplace.
This is enforced in the Cloud Console UI itself — you cannot add cross-domain emails even if you want to.
This is the hard wall that prevents external distribution to Google Workspace users. Internal hosting avoids it.
Travis's Direction & Product Calls
Product decisions and directional calls Travis made during the build. The implementation was done with Claude as a paired engineering collaborator; the product shape — what got built, what got cut, and how it should feel — was directed by Travis throughout.
Product scope & direction
- The original spec. Defined the end-to-end concept: @mention + image returns rendered card + JSON, @mention with feedback refines the card, error-learning loop, help command.
- Choosing Gemini as the LLM. Wanted to stay in the Google ecosystem.
- Reframing the refinement feature. When Claude initially recommended deferring the refinement loop in favor of improving first-shot accuracy, Travis pushed back:
User mocks aren't perfect and without that perfection it's hard for AI to get it right. Until we have perfect mocks the user needs to be able to give that feedback. And if we have perfect mocks we don't need this app.
This reframed refinement from "fix" to "core loop" and changed the product shape. - Deciding to build async correctly, not minimally. When weighing a small async patch against true async via Cloud Tasks, Travis said:
What's the right change, not the smallest change.
This committed the build to Cloud Tasks and removed the 30s ceiling permanently. - Calling the launch plan off. After hitting the distribution wall, Travis chose to pause the Marketplace push in favor of an internal conversation with his manager rather than spending 2-4 weeks on public Marketplace publishing for what might be better served by internal hosting.
Specific product/UX calls
- Memory scope and behavior. Per-space (not per-thread or per-user). 6-hour TTL — originally 24h was proposed, Travis preferred shorter. New image uploads reset memory. Text-only @mention with no prior card returns a specific error message with Travis's wording: "I don't know which mock you're talking about. Upload a mock and I'll convert it."
- Two-message response pattern. When Claude proposed one message with two cards ("cleaner, no auth setup"), Travis held firm on two separate messages with a text line between them: "Option B is what I'm wanting."
- "And the JSON:" copy. Travis specified the exact text between the rendered card and the JSON card after seeing the first version.
- "🔄 Updated" indicator. Travis's idea for signaling refinements visually.
- Differentiated ack messages. "When it does an iteration let's change the reply message from 'working on your mockup...' to 'iterating on your card...'" — caught the missing nuance and asked for it.
- Personality as a system, not a feature. When Claude scoped the playful ack line as a one-off fast-follow, Travis reframed it:
Optimally this would apply to all the apps replies to make it a more enjoyable experience.
This turned a single AI-generated string into a coherent voice system across five message types. - Voice direction. "Fun and light with the slightest bit of sarcasm." Also caught the repetition problem ("a LOT of the messages start with 'alright,'") which led to the explicit varied-openers guardrail.
- Pick which messages get personality and which don't. Travis called the scope: personality on the ack, updated badge, json intro, and self-correction; hardcoded/clear language on errors, instructions, and help. "The reported messages that aren't critical."
- "Still working..." follow-up threshold. Proposed the 10-second timer idea and confirmed the threshold.
- Middle-path call on three-stage updates. Claude initially recommended true in-place message updates. Travis weighed the latency tradeoff and chose the middle path: keep instant sync ack, post "still working" as a new message at 10s. "I'm ok with not doing update in place. But I would like the 'still working on it' follow up."
- Self-correction UX phrasing.
If the app knows it made an error, instead of just returning an error message can it say "oops, that card didn't work right. Working on a fix..."?
Reframed the failure message from informational to voiced and active. - Going back to Gemini Pro. Once async removed the latency constraint, Travis asked: "Would going back to Gemini 2.5 pro possibly fix this?" and approved the switch back to Pro for better card generation quality.
- Welcome message copy and structure. Wrote the welcome message content, including the bullet list structure and the space-vs-DM variants. Added the request for emojis per bullet for visual rhythm.
- Help and welcome as one thing. "I think the help message and welcome message might be the same thing." Simpler and truer to how the user experiences it.
Cards v2 domain expertise
Travis is the owner of the external Chat UI Kit documentation, and contributed corrections and additions to the reference doc that materially improved card-generation quality:
textParagraph.maxLinesfor expandable "Show more" behavior — not in prominent public docs, Travis flagged it.decoratedTextfield variants —bottomLabelText/contentTextwrap, whilebottomLabel/texttruncate. Important distinction for correct rendering that Gemini would not otherwise know.- Divider behavior. "Divider inside a section is inset; divider between sections is edge-to-edge (achieved automatically — no divider widget needed between sections)." This prevented the LLM from inserting duplicate dividers.
- Columns hard-limited to 2; use grid for 3+. Caught the LLM's tendency to invent 3-column layouts.
- General review of the reference doc for accuracy against his domain knowledge.
Pushbacks and corrections to Claude's approach
- Corrected scope on the refinement feature — from nice-to-have to core.
- Corrected scope on the personality system — from one-off to systemic.
- Pushed for the real architectural change rather than the minimal patch when addressing the 30s timeout.
- Rejected a proposed "clickable link to the DM" in the space-install welcome message when Claude surfaced that cross-user DM links don't work reliably. Accepted the "check your Chat sidebar" fallback instead.
- Rejected excess scope on the help card. Claude proposed a richly-formatted card with images; Travis specified: "I don't think we need an image, just text."
- Rejected the cost/time framing on an earlier draft of this document. The doc is about how it was built, not about selling the idea.
Testing & debugging direction
Throughout the build, Travis ran tests after each deploy, captured logs when things failed, and interpreted the UX of what actually landed in Chat (e.g. noticing the JSON message rendered before the card when messages arrived out of order, catching when the AI response started every message with "Alright,"). Most of the iteration cycles were driven by specific product observations Travis made in real use, not by theoretical problems Claude anticipated.