Index

Reverse-Engineering Google NotebookLM

Google NotebookLM is one of the most quietly powerful AI products out there. You drop in research papers, lecture notes, legal briefs — whatever — and you get an AI that only knows what you know. It generates quizzes from your syllabus. It turns your thesis into a podcast. It creates flashcards, mind maps, slide decks, video overviews — all grounded in your data, not the internet's.

There's no API. No REST endpoints. No SDK. No documentation. Google built this incredible product and locked the front door. If you wanted to build on top of it — automate it, integrate it into a larger system — you were out of luck.

So I reverse-engineered the whole thing.


Christmas Day, 2025

I remember the exact moment. December 25th. I was looking at NotebookLM in the browser, thinking about a workflow I wanted to automate: take a set of URLs, create a notebook, add the URLs as sources, wait for processing, generate a quiz, download it. Five manual steps I'd been doing every day.

I opened DevTools.

NotebookLM doesn't use a clean REST API under the hood. It uses Google's internal batchexecute protocol — the same RPC-over-HTTP mechanism that powers Google Docs, Google Search, and most of Google's web products. Requests are URL-encoded form data with nested JSON payloads. Responses are wrapped in layers of arrays. Nothing is documented.

I started mapping RPC methods. zKMnhd for creating projects. o3aBmc for listing them. Every method is an obfuscated string that could change with any Google deployment. The payloads are nested arrays where position matters more than keys. One wrong index and you get a cryptic error or, worse, silence.

First commit went up at midnight on Christmas. By the next morning, notebooks were creating and listing. By December 26th — sources, artifacts, audio, video, slides, languages, auto-refresh, quota management. 3,738 lines in a single day. The kind of zone where you forget to eat and the sun sets without you noticing.


The Three Bugs That Nearly Killed Streaming

Chat was the hardest feature. NotebookLM's chat uses a streaming endpoint (GenerateFreeFormStreamed) that returns chunked responses in a custom format:

<byte_count>
[["wrb.fr", null, "<escaped_json>"]]

Each chunk contains a snapshot of the full response so far, not a delta. The text can shrink between chunks — the API revises its own responses mid-stream. I'd never seen anything like it.

Missing source-path. The streaming URL requires a source-path parameter that isn't documented anywhere. Without it, the request silently succeeds but returns empty responses. Hours staring at 200 OK with no data before I caught it in the network tab of a working session.

Double URL encoding. The source-path contains a URL-encoded notebook ID. When I encoded the full URL, the notebook ID got double-encoded. Google's servers rejected it silently. No error, no response, nothing. The fix was a single line — use the raw value — but finding it took a full day.

Wrong ID position in request body. The chat request body requires the notebook ID as the last parameter in a nested array. I had it in the wrong position. The API returned a generic error that gave no indication of what was actually wrong.

I rewrote the streaming client twice. The second version has this at the top:

CRITICAL FIXES MAINTAINED:
1. source-path parameter in URL
2. Single URL encoding (no double encoding)
3. notebookId as last parameter in request body

Warnings to my future self. Every line represents hours of debugging.


Google Doesn't Want You Doing This

Every few weeks, something breaks. Not because my code changed — because Google's did.

NotebookLM is a living product. Google deploys constantly. RPC method signatures shift. Response formats change. Features appear and vanish.

Notebook descriptions don't exist. I built a full description feature — create, update, display. Then realized Google removed it from NotebookLM. The UI never had a description field. I'd been writing to a parameter that was silently ignored.

Artifact state 3 means... READY? The API returns numeric state codes. 0 = unknown, 1 = creating, 2 = ready, 3 = failed. At least, that's what I assumed. Then users reported state 3 for perfectly valid, fully-generated artifacts. In practice, the API uses both 2 and 3 to indicate "ready." The enum says FAILED = 3. The implementation maps both 2 and 3 to READY. Welcome to reverse engineering.

2FA breaks slide downloads. Downloading slides requires authenticated access to Google's rendering pipeline. When a user has 2FA enabled, the Playwright session needs special handling. Cookies expire differently. Session tokens behave differently.


The Authentication Nightmare

There's no API key. No OAuth flow. No service account. To authenticate, you need an auth token (extracted from a JavaScript variable called WIZ_global_data.SNlM0e on the page) and a full cookie string from an authenticated browser session.

I built three methods:

Manual credentials. Copy your auth token and cookies from DevTools, paste into .env. Works immediately, expires in hours.

Browser extraction. The SDK opens Chromium, navigates to NotebookLM, waits for you to log in (including 2FA), then extracts credentials automatically.

Auto-login. Provide your Google email and password. The SDK uses Playwright to automate the entire login flow — type email, click next, type password, click next, handle 2FA prompts.

Then auto-refresh on top: a background timer that re-extracts the auth token every 10 minutes from an active session. Connect once and the SDK keeps itself alive.

const sdk = new NotebookLMClient({
  auth: {
    email: process.env.GOOGLE_EMAIL,
    password: process.env.GOOGLE_PASSWORD,
    headless: false,
  },
  autoRefresh: true,
});
 
await sdk.connect();

It's not elegant. It's the only way to programmatically authenticate with a product that has no API.


Auto-Chunking

NotebookLM has limits. 500,000 words per source. 200MB per file. Users don't care about limits. They want to upload their entire research corpus and have it work.

When you try to add a text source with 1.2 million words, the SDK silently splits it into three 400K-word chunks and uploads them as separate sources. Each chunk gets metadata — chunk index, word boundaries, byte ranges — so you can reconstruct the original if needed. Small sources still return a simple string ID. Large ones return an AddSourceResult with wasChunked: true and all the chunk IDs.

const result = await sdk.sources.add.text(notebookId, {
  title: 'My Massive Research Paper',
  content: aMillionWordString,
});
 
if (typeof result === 'string') {
  // Small source, single ID
} else if (result.wasChunked) {
  console.log(`Split into ${result.chunks.length} chunks`);
}

The feature that made the SDK feel production-ready.


The Artifact Engine

NotebookLM's artifacts are its killer feature. From the same set of sources: quizzes, flashcards, mind maps, reports, infographics, slide decks, audio overviews (two-person podcast discussions), video overviews (animated explainers).

Each artifact type uses a different RPC method with a different payload structure. I unified all of it behind a single interface:

const quiz = await sdk.artifacts.create(notebookId, ArtifactType.QUIZ, {
  customization: { numberOfQuestions: 3, difficulty: 3 },
});
 
const audio = await sdk.artifacts.audio.create(notebookId, {
  customization: { format: 0, language: 'en', length: 2 },
});

Same pattern, every time. Create, poll for ready, download. The SDK handles the differences.

Supporting 80+ languages was its own thing. Each artifact type handles language differently. Audio and video accept a language code. Reports infer from the notebook's default. Quizzes and flashcards need the language instruction in the text prompt. Built a NotebookLanguageService to manage all of this — set it once at the notebook level and everything downstream respects it.


Killing Features

One of the hardest parts of building against an undocumented API is knowing when to kill things.

I built a chat-with-source example — then realized chat-basic already supported source selection. Redundant. Killed it. Built a document guideline generator — over-engineered, achievable with a chat prompt. Killed it. Built loadContent() to extract source text — worked for a while, then started returning "Service unavailable" after a Google update. Deprecated it.

Every dead feature is a maintenance burden. Every deprecated method is a promise you'll eventually break. Getting comfortable with removal made the SDK better than getting comfortable with accumulation.


The Commit Messages

If you read the git log, you see the real journey:

"idk what this was" — changes made, forgot what they were, committed anyway.

"idk wtf iam doin" — honest work in progress.

"wtf" — when the API returns something you genuinely cannot explain.

"fixed auth" — the most common commit in software history. Auth is never really fixed.

These sit alongside "Implemented Notebook Language Service to manage default output language for notebooks, enhancing artifact creation and chat responses." The git log of any real project is a mood graph. Moments of clarity and moments of confusion. The breakthroughs and the 2 AM hacks. That's software.


What This Opens Up

A professor uploads lecture slides — the system generates quizzes, flashcards, and audio summaries automatically. A researcher has 200 papers to synthesize — the SDK creates a notebook, adds them all (auto-chunked), generates a report. A podcast producer has 500 episodes — each becomes a source, the SDK generates blog posts, social content, study guides. A company's internal docs sync into notebooks and employees can query collective knowledge with citations.

And the big one: NotebookLM as a tool that AI agents can use. An agent researching a topic creates a notebook, adds sources, chats with it to synthesize, generates artifacts — all programmatically. The SDK becomes a knowledge grounding skill for any agent framework.

Google built an incredible product. I built the bridge to it.


notebooklm-kit is open source on GitHub and published on npm.