From ed1fedec49ad6482024cd155493e3dca39d8139c Mon Sep 17 00:00:00 2001 From: WJZ_P <110795301+WJZ-P@users.noreply.github.com> Date: Wed, 25 Mar 2026 17:15:27 +0800 Subject: [PATCH] =?UTF-8?q?docs:=20=E6=96=B0=E5=A2=9E=20README.en.md=20?= =?UTF-8?q?=E8=8B=B1=E6=96=87=E6=96=87=E6=A1=A3=EF=BC=8C=E6=9B=B4=E6=96=B0?= =?UTF-8?q?=20README.md=20=E8=AF=AD=E8=A8=80=E5=88=87=E6=8D=A2=E9=93=BE?= =?UTF-8?q?=E6=8E=A5?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- README.en.md | 263 +++++++++++++++++++++++++++++++++++++++++++++++++++ README.md | 2 + 2 files changed, 265 insertions(+) create mode 100644 README.en.md diff --git a/README.en.md b/README.en.md new file mode 100644 index 0000000..84b6b2e --- /dev/null +++ b/README.en.md @@ -0,0 +1,263 @@ +# Gemini Skill + +English | [ไธญๆ–‡](./README.md) + +Automate Gemini web (gemini.google.com) via CDP (Chrome DevTools Protocol) โ€” AI image generation, conversations, image extraction, and more. + +## โœจ Features + +- ๐ŸŽจ **AI Image Generation** โ€” Send prompts to generate images, with full-size high-res download support +- ๐Ÿ’ฌ **Text Conversations** โ€” Multi-turn dialogue with Gemini +- ๐Ÿ–ผ๏ธ **Image Upload** โ€” Upload reference images for image-to-image generation +- ๐Ÿ“ฅ **Image Extraction** โ€” Extract images from sessions via base64 or CDP full-size download +- ๐Ÿ”„ **Session Management** โ€” New chat, temp chat, model switching, navigate to historical sessions +- ๐Ÿงน **Auto Watermark Removal** โ€” Downloaded images automatically have the Gemini watermark stripped +- ๐Ÿค– **MCP Server** โ€” Standard MCP protocol interface, callable by any MCP client (Claude, CodeBuddy, etc.) + +## ๐Ÿ—๏ธ Architecture + +``` +โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” +โ”‚ MCP Client (AI) โ”‚ +โ”‚ Claude / CodeBuddy / ... โ”‚ +โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ + โ”‚ stdio (JSON-RPC) + โ–ผ +โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” +โ”‚ mcp-server.js (MCP Protocol Layer) โ”‚ +โ”‚ Registers all MCP tools, orchestrates โ”‚ +โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ + โ”‚ + โ–ผ +โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” +โ”‚ index.js โ†’ browser.js (Connection Layer) โ”‚ +โ”‚ ensureBrowser() โ†’ auto-start Daemon โ†’ CDP link โ”‚ +โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ + โ”‚ HTTP (acquire/status) โ”‚ WebSocket (CDP) + โ–ผ โ–ผ +โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” +โ”‚ Browser Daemon โ”‚ โ”‚ Chrome / Edge โ”‚ +โ”‚ (standalone process)โ”‚โ”€โ”€โ”€โ–ถโ”‚ gemini.google.com โ”‚ +โ”‚ daemon/server.js โ”‚ โ”‚ โ”‚ +โ”‚ โ”œโ”€ engine.js โ”‚ โ”‚ Stealth + anti-detect โ”‚ +โ”‚ โ”œโ”€ handlers.js โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ +โ”‚ โ””โ”€ lifecycle.js โ”‚ +โ”‚ 30-min idle TTL โ”‚ +โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ +``` + +**Core Design Principles:** + +- **Daemon Mode** โ€” The browser process is managed by a standalone Daemon. After MCP calls finish, the browser stays alive; it auto-terminates only after 30 minutes of inactivity. +- **On-demand Auto-start** โ€” If the Daemon isn't running, MCP tools will automatically spawn it. No manual startup required. +- **Stealth Anti-detect** โ€” Uses `puppeteer-extra-plugin-stealth` to bypass website bot detection. +- **Separation of Concerns** โ€” `mcp-server.js` (protocol) โ†’ `gemini-ops.js` (operations) โ†’ `browser.js` (connection) โ†’ `daemon/` (process management) + +## ๐Ÿ“ฆ Installation + +### Prerequisites + +- **Node.js** โ‰ฅ 18 +- **Chrome / Edge / Chromium** โ€” Any one of these must be installed on your system (or specify a path via `BROWSER_PATH`) +- The browser must be **logged into a Google account** beforehand (Gemini requires authentication) + +### Install Dependencies + +```bash +git clone https://github.com/yourname/gemini-skill.git +cd gemini-skill +npm install +``` + +## โš™๏ธ Configuration + +All configuration is done via environment variables or a `.env` file. Create a `.env` file in the project root: + +```env +# Browser executable path (auto-detects Chrome/Edge/Chromium if unset) +# BROWSER_PATH=C:\Program Files\Google\Chrome\Application\chrome.exe + +# CDP remote debugging port (default: 40821) +# BROWSER_DEBUG_PORT=40821 + +# Headless mode (default: false โ€” keep it off for first-time login) +# BROWSER_HEADLESS=false + +# Image output directory (default: ./gemini-image) +# OUTPUT_DIR=./gemini-image + +# Daemon HTTP port (default: 40225) +# DAEMON_PORT=40225 + +# Daemon idle timeout in ms (default: 30 minutes) +# DAEMON_TTL_MS=1800000 +``` + +`.env.development` is also supported (takes priority over `.env`). + +**Priority order:** `process.env` > `.env.development` > `.env` > code defaults + +## ๐Ÿš€ Usage + +### Option 1: As an MCP Server (Recommended) + +Add the following to your MCP client configuration: + +```json +{ + "mcpServers": { + "gemini": { + "command": "node", + "args": ["/src/mcp-server.js"] + } + } +} +``` + +Once started, the AI can invoke all tools via the MCP protocol. + +### Option 2: Command Line + +```bash +# Start MCP Server (stdio mode, for AI clients) +npm run mcp + +# Start Browser Daemon standalone (usually unnecessary โ€” MCP auto-starts it) +npm run daemon + +# Run the demo +npm run demo +``` + +### Option 3: As a Library + +```javascript +import { createGeminiSession, disconnect } from './src/index.js'; + +const { ops } = await createGeminiSession(); + +// Generate an image +const result = await ops.generateImage('Draw a cute cat', { fullSize: true }); +console.log('Image saved to:', result.filePath); + +// Disconnect when done (browser stays alive, managed by Daemon) +disconnect(); +``` + +## ๐Ÿ”ง MCP Tools + +### Image Generation + +| Tool | Description | Key Parameters | +|------|-------------|----------------| +| `gemini_generate_image` | Full image generation pipeline (takes 60โ€“120s) | `prompt`, `newSession`, `referenceImages`, `fullSize`, `timeout` | + +### Session Management + +| Tool | Description | Key Parameters | +|------|-------------|----------------| +| `gemini_new_chat` | Start a new blank conversation | โ€” | +| `gemini_temp_chat` | Enter temporary chat mode (no history saved) | โ€” | +| `gemini_navigate_to` | Navigate to a specific Gemini URL (e.g. a saved session) | `url`, `timeout` | + +### Model & Conversation + +| Tool | Description | Key Parameters | +|------|-------------|----------------| +| `gemini_switch_model` | Switch model (pro / quick / think) | `model` | +| `gemini_send_message` | Send text and wait for reply (takes 10โ€“60s) | `message`, `timeout` | + +### Image Operations + +| Tool | Description | Key Parameters | +|------|-------------|----------------| +| `gemini_upload_images` | Upload images to the input box | `images` | +| `gemini_get_images` | List all images in the current session (metadata only) | โ€” | +| `gemini_extract_image` | Extract image base64 data and save locally | `imageUrl` | +| `gemini_download_full_size_image` | Download full-size high-res image | `index` | + +### Text Responses + +| Tool | Description | Key Parameters | +|------|-------------|----------------| +| `gemini_get_all_text_responses` | Get all text responses in the session | โ€” | +| `gemini_get_latest_text_response` | Get the latest text response | โ€” | + +### Diagnostics & Management + +| Tool | Description | Key Parameters | +|------|-------------|----------------| +| `gemini_check_login` | Check Google login status | โ€” | +| `gemini_probe` | Probe page element states | โ€” | +| `gemini_reload_page` | Reload the page | `timeout` | +| `gemini_browser_info` | Get browser connection info | โ€” | + +## ๐Ÿ”„ Daemon Lifecycle + +``` +First MCP call + โ”‚ + โ”œโ”€ Daemon not running โ†’ auto-spawn (detached + unref) + โ”‚ โ†’ poll until ready (up to 15s) + โ”‚ + โ”œโ”€ GET /browser/acquire โ†’ launch/reuse browser + reset 30-min countdown + โ”‚ + โ”œโ”€ MCP tool finishes โ†’ disconnect() (closes WebSocket, keeps browser alive) + โ”‚ + โ”œโ”€ Another call within 30 min โ†’ countdown resets (extends TTL) + โ”‚ + โ””โ”€ 30 min with no activity โ†’ close browser + stop HTTP server + exit process + (next call will auto-respawn) +``` + +**Daemon API Endpoints:** + +| Endpoint | Description | +|----------|-------------| +| `GET /browser/acquire` | Acquire browser connection (resets TTL) | +| `GET /browser/status` | Query browser status (does NOT reset TTL) | +| `POST /browser/release` | Manually destroy the browser | +| `GET /health` | Daemon health check | + +## ๐Ÿ“ Project Structure + +``` +gemini-skill/ +โ”œโ”€โ”€ src/ +โ”‚ โ”œโ”€โ”€ index.js # Unified entry point +โ”‚ โ”œโ”€โ”€ mcp-server.js # MCP protocol server (registers all tools) +โ”‚ โ”œโ”€โ”€ gemini-ops.js # Gemini page operations (core logic) +โ”‚ โ”œโ”€โ”€ operator.js # Low-level DOM operation wrappers +โ”‚ โ”œโ”€โ”€ browser.js # Browser connector (Skill-facing) +โ”‚ โ”œโ”€โ”€ config.js # Centralized configuration +โ”‚ โ”œโ”€โ”€ util.js # Utility functions +โ”‚ โ”œโ”€โ”€ watermark-remover.js # Image watermark removal (via sharp) +โ”‚ โ”œโ”€โ”€ demo.js # Usage examples +โ”‚ โ”œโ”€โ”€ assets/ # Static assets +โ”‚ โ””โ”€โ”€ daemon/ # Browser Daemon (standalone process) +โ”‚ โ”œโ”€โ”€ server.js # HTTP micro-service entry +โ”‚ โ”œโ”€โ”€ engine.js # Browser engine (launch/connect/terminate) +โ”‚ โ”œโ”€โ”€ handlers.js # API route handlers +โ”‚ โ””โ”€โ”€ lifecycle.js # Lifecycle control (lazy shutdown timer) +โ”œโ”€โ”€ references/ # Reference documentation +โ”œโ”€โ”€ SKILL.md # AI invocation spec (read by MCP clients) +โ”œโ”€โ”€ package.json +โ””โ”€โ”€ .env # Environment config (create manually) +``` + +## โš ๏ธ Notes + +1. **First-time login required** โ€” On the first run, the browser will open the Gemini page. Complete Google account login manually. Login state is persisted in `userDataDir`, so subsequent runs won't require re-login. + +2. **Single instance only** โ€” Only one browser instance can use a given CDP port. Running multiple instances will cause port conflicts. + +3. **Windows Server considerations** โ€” Path normalization and Safe Browsing bypass are built-in, but double-check: + - Chrome/Edge is properly installed + - The output directory is writable + - The firewall is not blocking localhost traffic + +4. **Image generation takes time** โ€” Typically 60โ€“120 seconds. Set your MCP client's `timeoutMs` to โ‰ฅ 180000 (3 minutes). + +## ๐Ÿ“„ License + +ISC diff --git a/README.md b/README.md index 086f699..1114bb7 100644 --- a/README.md +++ b/README.md @@ -1,5 +1,7 @@ # Gemini Skill +[English](./README.en.md) | ไธญๆ–‡ + ้€š่ฟ‡ CDP๏ผˆChrome DevTools Protocol๏ผ‰ๆ“ๆŽง Gemini ็ฝ‘้กต็‰ˆ๏ผˆgemini.google.com๏ผ‰๏ผŒๅฎž็Žฐ AI ็”Ÿๅ›พใ€ๅฏน่ฏใ€ๅ›พ็‰‡ๆๅ–็ญ‰่‡ชๅŠจๅŒ–ๆ“ไฝœใ€‚ ## โœจ ๅŠŸ่ƒฝ