18 March 2026

Building an NLWeb AI Chat on Azure Static Web Apps: Every Lesson Learned

azure

azure-static-web-apps

azure-functions

github-models

nlweb

next-js

typescript

ci-cd

github-actions

content-model

knowledge-graph

architecture-decisions

I recently rebuilt my personal site from scratch — blog, podcast, knowledge graph, and an AI-powered chat feature using Microsoft's emerging NLWeb protocol. The idea is simple: visitors can ask a natural-language question and get back a grounded, conversational answer pointing them at the most relevant content on the site — rather than a generic search results list.

What followed was one of the most instructive deployment debugging sessions I've had in a while. The architecture decisions were interesting. The content model required proper thought. Getting the AI chat to actually run on Azure Static Web Apps took a lot more persistence than I expected.

This is the full story — from the first architectural decision to the moment it all worked — so you don't have to repeat any of it.

Starting with decisions, not code

The first thing I did was write specs, not code. Before a single file was created I locked down the answers to questions that would otherwise cause expensive rework later.

Hosting: Azure Static Web Apps. Free tier, GitHub Actions CI/CD baked in, PR preview environments, global CDN, managed Functions runtime — and it's where I spend my days professionally so I know it well. The only real alternative I considered was Vercel, but SWA wins on cost and the Azure-native story.

Framework: Next.js 15 with output: 'export'. Static site generation gives me real URLs (/blog/my-post), full SEO metadata control, and no server to manage. The export flag produces a plain out/ directory of HTML, CSS, and JS that SWA serves directly.

Content authoring: Git-based MDX, no admin panel. Every post lives in content/blog/<slug>/index.mdx and every episode in content/tell-your-story/<slug>/index.mdx. Front matter carries all the structured data; the body is Markdown. An admin panel is deferred to v2 — get the authoring workflow validated first, then automate it.

AI chat: keyword search in v1, GitHub Models in v2. I deliberately shipped the NLInterface component backed by a local search index first. This meant the UI was working and tested before I touched any AI infrastructure. The AI layer was then dropped in behind the same /api/ask interface. If the AI call fails, the UI automatically falls back to keyword search — so it's never dead.

These decisions, written down before any code, saved me significant rework.

The content model

With MDX as the authoring format, I needed a structured front matter schema that would serve everything: the content service, the knowledge graph, search, SEO metadata, and eventually the AI prompt catalog.

The base fields every piece of content carries:

id: "blog-2026-03-my-post"          # Stable unique identifier
contentType: "blogPost"              # Discriminator
title: "My Post Title"
slug: "my-post"                      # URL-safe, immutable
status: "published"                  # draft | review | scheduled | published | archived
publishDate: "2026-03-18"
shortDescription: "One or two sentences for cards and meta tags."
primaryTheme: "azure"                # Drives graph colouring
themes: ["azure", "ai"]             # All themes including primary
tags: ["azure", "functions"]        # Freeform, for filtering

Blog posts extend this with articleFormat, technicalLevel, entitiesTechnologies, entitiesOrganisations, and migratedFrom (for the LinkedIn articles I imported). Podcast episodes extend it with guestName, guestTitle, guestBio, duration, and transcriptPath.

The unified model means a single content service can handle both types — and the knowledge graph can draw edges across them without special-casing.

The content pipeline

Content authoring is the source of truth; the generated artefacts are the deployable output. I built four TypeScript scripts, run via tsx:

validate-content.ts — runs on every PR via a GitHub Actions workflow. It checks required fields, ID uniqueness, slug format ([a-z0-9-]+), ISO date validity, primaryTheme membership in themes, relatedContentIds cross-references, and guestName presence on episodes. Exits code 1 on any error. This is the gate that keeps the content index clean.

build-content-index.ts — reads every content/**/*.mdx, filters to items with status: published or status: scheduled with a publishDate in the past, strips front matter from the body, and writes public/generated/content-index.json. This is the master data file everything else derives from.

build-graph.ts — reads the content index and builds a knowledge graph. Content items become nodes; themes become nodes; entities (technologies, organisations, people) that appear in two or more pieces of content become nodes. Edges carry type (theme, entity, related) and weight. The graph drives the /explore page.

build-search.ts — produces a lightweight public/generated/search-index.json containing only the fields needed for search and result rendering (no body content). This file is what the AI function fetches.

npm run build:all-content  # chains all four in sequence

The generated files end up in public/generated/, which Next.js's static export copies verbatim into out/generated/. They're then served as plain static assets at /generated/*.json.

The content service pattern

In Next.js static export mode there's no server-side data layer — everything is either static or fetched at runtime from the client. I wrote src/lib/contentService.ts as a module with a simple fetch-and-cache pattern:

let contentIndexCache: ContentItem[] | null = null;

async function fetchContentIndex(): Promise<ContentItem[]> {
  if (contentIndexCache) return contentIndexCache;
  const res = await fetch('/generated/content-index.json');
  if (!res.ok) throw new Error(`Failed to fetch content index: ${res.status}`);
  contentIndexCache = await res.json();
  return contentIndexCache;
}

Public functions like getArticles(), getEpisodes(), and searchContent() all call through this cache. The cache is module-level state — it persists for the lifetime of the browser session and is cleared between server restarts in development. In practice this means the JSON is fetched once on first use and then served from memory.

For page generation at build time, the content service also runs during next build (where fetch is polyfilled by Next.js), so pages can be statically rendered with real content.

The knowledge graph

The /explore page renders an interactive D3 force simulation of the content graph. I was particularly pleased with how the graph model fell out of the content model naturally — once every piece of content declares its themes and entities, the graph almost builds itself.

The graph has three node types:

Content nodes — one per published item, sized by connection count
Theme nodes — one per unique theme string across all content
Entity nodes — one per technology/organisation/person that appears in ≥ 2 items

And three edge types with different weights:

theme (content → theme, weight 0.9)
entity (content → entity, weight 0.6)
related (content → content, weight 0.7, bidirectional and deduped)

One deliberate decision: no tag edges in v1. Tags are display-only filtering aids. Including them in the graph would create too much noise — almost every post has azure and microsoft tags, which would produce a densely connected hairball rather than a meaningful topology.

The graph page has graph/list view toggle, search, content type filter, zoom/pan, hover tooltips, and click-to-navigate for content nodes. All backed by the same graph.json static asset.

The AI chat: v1 keyword, v2 AI

The NLInterface component was built in two stages deliberately.

v1 wired the chat UI to searchContent() from the content service — pure keyword matching across title, description, excerpt, tags, and themes. No API calls, no tokens required, works offline. The UI — floating bottom-right overlay, chat message history, result cards, suggestion chips, keyboard accessibility — was all built and tested against this.

v2 replaced the data source with a call to GET /api/ask?query=.... The component tries the API first; if it gets anything other than a 200 it falls back to searchContent() silently. The user always gets a result.

This progressive approach meant I had a working, testable UI before writing a single line of Azure Functions code — and the fallback behaviour was a natural consequence of the two-stage design, not an afterthought.

The NLWeb API function

The function lives in api/src/functions/ask.ts — an Azure Functions v4 TypeScript HTTP trigger:

app.http('ask', {
  methods: ['GET', 'POST'],
  authLevel: 'anonymous',
  route: 'ask',
  handler: async (request, context) => { ... }
});

The handler:

Reads ?query= from the request
Loads search-index.json by fetching SITE_URL + /generated/search-index.json
Builds a system prompt containing the full content catalog (title, themes, tags, description per item)
Calls GitHub Models gpt-4o-mini via the OpenAI-compatible endpoint
Parses the JSON response ({ message, relevant: [3, 7, 1] })
Maps the 1-based indices back to full result objects
Returns a NLWeb-shaped response

The prev query parameter carries a comma-separated list of previous queries for conversational context — the function injects them as fake prior turns so the model can decontextualise follow-up questions.

Standing up CI/CD

The SWA GitHub Action workflow is auto-created by Azure when you create the resource. I needed to adapt it for the content pipeline. The final working configuration:

- name: Deploy to Azure Static Web Apps
  uses: Azure/static-web-apps-deploy@v1
  with:
    azure_static_web_apps_api_token: ${{ secrets.AZURE_STATIC_WEB_APPS_API_TOKEN_PROUD_PEBBLE_0F32C4110 }}
    repo_token: ${{ secrets.GITHUB_TOKEN }}
    action: 'upload'
    app_location: '/'
    api_location: 'api'
    app_build_command: 'npm run build:all-content && npm run build'
    api_build_command: 'npm install && npm run build'
    output_location: 'out'

Getting to this took many iterations. Here are each of the fixes.

Fix 1: `output_location` must match your framework's export directory

Next.js 15 with output: 'export' writes to out/ by default. The portal-created workflow had output_location: "build" — which is the React CRA default. Changed it to "out" and the deployment started finding the built site.

Fix 2: Don't use `skip_app_build: true` — use `app_build_command`

My first instinct to speed things up was to pre-build in a workflow step and pass skip_app_build: true to the SWA action. The problem: when Oryx (the SWA build system) is skipped it doesn't know where to find the app output, producing a "Failed to find default file" error.

Instead keep Oryx involved and tell it what to run via app_build_command. Oryx handles runtime detection; your custom command runs inside that context.

Fix 3: The same applies to `skip_api_build`

skip_api_build: true produces "Function language info isn't provided" — Oryx needs to examine the api/ folder to detect the Node.js runtime. Use api_build_command to customise without skipping.

Fix 4: Azure Functions v4 requires a `main` entry point in `package.json`

Azure Functions v4 removed per-function function.json files in favour of code-based registration. But you also need to tell the runtime where the compiled output is:

{
  "main": "dist/functions/*.js"
}

Without this, the deployment succeeds but every /api/* route returns 404.

Fix 5: Anchor your `.gitignore` `dist/` rule to the repo root

I had dist/ in the root .gitignore. Without a leading slash, this matches dist/ anywhere in the repo — which silently excluded api/dist/ from git. The compiled function output was never committed and never deployed.

The fix: use /dist/ (with the leading slash) to anchor the rule to the repo root only.

Fix 6: Always verify your actual SWA hostname with the CLI

When I ran az staticwebapp show I discovered the hostname was proud-pebble-0f32c4110.1.azurestaticapps.net — not proud-pebble-0f32c4110.azurestaticapps.net as I'd been assuming. The .1. subdomain segment is a regional slot indicator that the Azure portal doesn't make obvious. Every test I'd run was hitting a non-existent host.

az staticwebapp show --name <app> --resource-group <rg> --query "defaultHostname" -o tsv

Always do this before debugging API responses.

Fix 7: Commit your generated files

The function fetches SITE_URL + /generated/search-index.json. I had public/generated/*.json in .gitignore — an early architectural decision that these should be runtime-generated, not committed. But with the SWA managed hosting model, the static file serving is what matters: if the file isn't in out/ it isn't served.

I commented out the gitignore rule and committed the generated files. CI regenerates them on every deploy via app_build_command to keep them current. The "Failed to load content index" error disappeared.

Fix 8: Remove `response_format: { type: 'json_object' }` for GitHub Models

The OpenAI Node.js SDK's response_format: { type: 'json_object' } works on Azure OpenAI and the OpenAI API. It does not work on the GitHub Models inference endpoint — it throws a 400 and the whole completion fails.

Enforce JSON via your system prompt only, and add defensive code-fence stripping:

const cleaned = raw
  .replace(/^```(?:json)?\s*/i, '')
  .replace(/\s*```$/, '')
  .trim();
parsed = JSON.parse(cleaned);

Fix 9: GitHub fine-grained PAT requires the `Models` account permission

A PAT without the Models scope gets 401 — The 'models' permission is required to access this endpoint. It's an Account Permission, not a repository permission, so it's easy to miss.

When creating your fine-grained token: Settings → Developer settings → Fine-grained tokens → Account Permissions → Models → Read-only.

The end-to-end flow

After all of that, the complete flow works:

Visitor types a question in the chat UI
NLInterface.tsx calls GET /api/ask?query=...
The Azure Function fetches and caches search-index.json
It builds a system prompt containing the full content catalog
GitHub Models (gpt-4o-mini) returns { message, relevant: [3, 7, 1] }
The function maps indices back to full result objects
The NLWeb-shaped response is returned and rendered as a chat message + content cards
If the API call fails at any point, the UI falls back to keyword search

The whole thing — content model, pipeline, knowledge graph, AI chat, CI/CD — is tagged v1.0.0-mvp in the repo.

Summary of all lessons

#	Area	Lesson
1	Architecture	Lock down hosting, framework, and authoring model decisions before writing code
2	Architecture	Ship a working v1 (keyword search) before wiring up AI — gives you a tested fallback
3	Content model	A unified front matter schema across all content types pays dividends in the graph and AI prompt
4	Content pipeline	A `validate:content` script as a PR gate keeps the content index clean
5	CI/CD	Match `output_location` to your framework's actual output directory (`out/` not `build/`)
6	CI/CD	Don't use `skip_app_build` — use `app_build_command` instead
7	CI/CD	Don't use `skip_api_build` — use `api_build_command` instead
8	Azure Functions	v4 needs `"main": "dist/functions/*.js"` in `api/package.json`
9	Git	Use `/dist/` (anchored) not `dist/` in root `.gitignore`
10	Azure	Always verify the SWA hostname with `az staticwebapp show` — it may have `.1.` in it
11	Deployment	Commit generated files, or ensure `app_build_command` produces them
12	GitHub Models	Remove `response_format: json_object` — not supported on this endpoint
13	GitHub Models	Fine-grained PAT needs the Models (Read-only) account permission scope, not just repo scopes

If you're building something similar — a personal site with grounded AI search on Azure Static Web Apps — I hope this saves you a good few hours. Most of these were obvious in hindsight. None of them were documented anywhere I found at the time.

Continue exploring

Explore the topic graph