# Gemini Skill English | [δΈ­ζ–‡](./README.md) Automate Gemini web (gemini.google.com) via CDP (Chrome DevTools Protocol) β€” AI image generation, conversations, image extraction, and more. ## ✨ Features - 🎨 **AI Image Generation** β€” Send prompts to generate images, with full-size high-res download support - πŸ’¬ **Text Conversations** β€” Multi-turn dialogue with Gemini - πŸ–ΌοΈ **Image Upload** β€” Upload reference images for image-to-image generation - πŸ“₯ **Image Extraction** β€” Extract images from sessions via base64 or CDP full-size download - πŸ”„ **Session Management** β€” New chat, temp chat, model switching, navigate to historical sessions - 🧹 **Auto Watermark Removal** β€” Downloaded images automatically have the Gemini watermark stripped - πŸ€– **MCP Server** β€” Standard MCP protocol interface, callable by any MCP client (Claude, CodeBuddy, etc.) ## πŸ“Έ Example Generate game-style sticker images through AI conversation: Gemini image generation example ## πŸ—οΈ Architecture ``` β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ MCP Client (AI) β”‚ β”‚ Claude / CodeBuddy / ... β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ stdio (JSON-RPC) β–Ό β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ mcp-server.js (MCP Protocol Layer) β”‚ β”‚ Registers all MCP tools, orchestrates β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β–Ό β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ index.js β†’ browser.js (Connection Layer) β”‚ β”‚ ensureBrowser() β†’ auto-start Daemon β†’ CDP link β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ HTTP (acquire/status) β”‚ WebSocket (CDP) β–Ό β–Ό β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ Browser Daemon β”‚ β”‚ Chrome / Edge β”‚ β”‚ (standalone process)│───▢│ gemini.google.com β”‚ β”‚ daemon/server.js β”‚ β”‚ β”‚ β”‚ β”œβ”€ engine.js β”‚ β”‚ Stealth + anti-detect β”‚ β”‚ β”œβ”€ handlers.js β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ └─ lifecycle.js β”‚ β”‚ 30-min idle TTL β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ ``` **Core Design Principles:** - **Daemon Mode** β€” The browser process is managed by a standalone Daemon. After MCP calls finish, the browser stays alive; it auto-terminates only after 30 minutes of inactivity. - **On-demand Auto-start** β€” If the Daemon isn't running, MCP tools will automatically spawn it. No manual startup required. - **Stealth Anti-detect** β€” Uses `puppeteer-extra-plugin-stealth` to bypass website bot detection. - **Separation of Concerns** β€” `mcp-server.js` (protocol) β†’ `gemini-ops.js` (operations) β†’ `browser.js` (connection) β†’ `daemon/` (process management) ## πŸ“¦ Installation ### Prerequisites - **Node.js** β‰₯ 18 - **Chrome / Edge / Chromium** β€” Any one of these must be installed on your system (or specify a path via `BROWSER_PATH`) - The browser must be **logged into a Google account** beforehand (Gemini requires authentication) ### Install Dependencies ```bash git clone https://github.com/yourname/gemini-skill.git cd gemini-skill npm install ``` ## βš™οΈ Configuration All configuration is done via environment variables or a `.env` file. Create a `.env` file in the project root: ```env # Browser executable path (auto-detects Chrome/Edge/Chromium if unset) # BROWSER_PATH=C:\Program Files\Google\Chrome\Application\chrome.exe # CDP remote debugging port (default: 40821) # BROWSER_DEBUG_PORT=40821 # Headless mode (default: false β€” keep it off for first-time login) # BROWSER_HEADLESS=false # Image output directory (default: ./gemini-image) # OUTPUT_DIR=./gemini-image # Daemon HTTP port (default: 40225) # DAEMON_PORT=40225 # Daemon idle timeout in ms (default: 30 minutes) # DAEMON_TTL_MS=1800000 ``` `.env.development` is also supported (takes priority over `.env`). **Priority order:** `process.env` > `.env.development` > `.env` > code defaults ## πŸš€ Usage ### Option 1: As an MCP Server (Recommended) Add the following to your MCP client configuration: ```json { "mcpServers": { "gemini": { "command": "node", "args": ["/src/mcp-server.js"] } } } ``` Once started, the AI can invoke all tools via the MCP protocol. ### Option 2: Command Line ```bash # Start MCP Server (stdio mode, for AI clients) npm run mcp # Start Browser Daemon standalone (usually unnecessary β€” MCP auto-starts it) npm run daemon # Run the demo npm run demo ``` ### Option 3: As a Library ```javascript import { createGeminiSession, disconnect } from './src/index.js'; const { ops } = await createGeminiSession(); // Generate an image const result = await ops.generateImage('Draw a cute cat', { fullSize: true }); console.log('Image saved to:', result.filePath); // Disconnect when done (browser stays alive, managed by Daemon) disconnect(); ``` ## πŸ”§ MCP Tools ### Image Generation | Tool | Description | Key Parameters | |------|-------------|----------------| | `gemini_generate_image` | Full image generation pipeline (takes 60–120s) | `prompt`, `newSession`, `referenceImages`, `fullSize`, `timeout` | ### Session Management | Tool | Description | Key Parameters | |------|-------------|----------------| | `gemini_new_chat` | Start a new blank conversation | β€” | | `gemini_temp_chat` | Enter temporary chat mode (no history saved) | β€” | | `gemini_navigate_to` | Navigate to a specific Gemini URL (e.g. a saved session) | `url`, `timeout` | ### Model & Conversation | Tool | Description | Key Parameters | |------|-------------|----------------| | `gemini_switch_model` | Switch model (pro / quick / think) | `model` | | `gemini_send_message` | Send text and wait for reply (takes 10–60s) | `message`, `timeout` | ### Image Operations | Tool | Description | Key Parameters | |------|-------------|----------------| | `gemini_upload_images` | Upload images to the input box | `images` | | `gemini_get_images` | List all images in the current session (metadata only) | β€” | | `gemini_extract_image` | Extract image base64 data and save locally | `imageUrl` | | `gemini_download_full_size_image` | Download full-size high-res image | `index` | ### Text Responses | Tool | Description | Key Parameters | |------|-------------|----------------| | `gemini_get_all_text_responses` | Get all text responses in the session | β€” | | `gemini_get_latest_text_response` | Get the latest text response | β€” | ### Diagnostics & Management | Tool | Description | Key Parameters | |------|-------------|----------------| | `gemini_check_login` | Check Google login status | β€” | | `gemini_probe` | Probe page element states | β€” | | `gemini_reload_page` | Reload the page | `timeout` | | `gemini_browser_info` | Get browser connection info | β€” | ## πŸ”„ Daemon Lifecycle ``` First MCP call β”‚ β”œβ”€ Daemon not running β†’ auto-spawn (detached + unref) β”‚ β†’ poll until ready (up to 15s) β”‚ β”œβ”€ GET /browser/acquire β†’ launch/reuse browser + reset 30-min countdown β”‚ β”œβ”€ MCP tool finishes β†’ disconnect() (closes WebSocket, keeps browser alive) β”‚ β”œβ”€ Another call within 30 min β†’ countdown resets (extends TTL) β”‚ └─ 30 min with no activity β†’ close browser + stop HTTP server + exit process (next call will auto-respawn) ``` **Daemon API Endpoints:** | Endpoint | Description | |----------|-------------| | `GET /browser/acquire` | Acquire browser connection (resets TTL) | | `GET /browser/status` | Query browser status (does NOT reset TTL) | | `POST /browser/release` | Manually destroy the browser | | `GET /health` | Daemon health check | ## πŸ“ Project Structure ``` gemini-skill/ β”œβ”€β”€ src/ β”‚ β”œβ”€β”€ index.js # Unified entry point β”‚ β”œβ”€β”€ mcp-server.js # MCP protocol server (registers all tools) β”‚ β”œβ”€β”€ gemini-ops.js # Gemini page operations (core logic) β”‚ β”œβ”€β”€ operator.js # Low-level DOM operation wrappers β”‚ β”œβ”€β”€ browser.js # Browser connector (Skill-facing) β”‚ β”œβ”€β”€ config.js # Centralized configuration β”‚ β”œβ”€β”€ util.js # Utility functions β”‚ β”œβ”€β”€ watermark-remover.js # Image watermark removal (via sharp) β”‚ β”œβ”€β”€ demo.js # Usage examples β”‚ β”œβ”€β”€ assets/ # Static assets β”‚ └── daemon/ # Browser Daemon (standalone process) β”‚ β”œβ”€β”€ server.js # HTTP micro-service entry β”‚ β”œβ”€β”€ engine.js # Browser engine (launch/connect/terminate) β”‚ β”œβ”€β”€ handlers.js # API route handlers β”‚ └── lifecycle.js # Lifecycle control (lazy shutdown timer) β”œβ”€β”€ references/ # Reference documentation β”œβ”€β”€ SKILL.md # AI invocation spec (read by MCP clients) β”œβ”€β”€ package.json └── .env # Environment config (create manually) ``` ## ⚠️ Notes 1. **First-time login required** β€” On the first run, the browser will open the Gemini page. Complete Google account login manually. Login state is persisted in `userDataDir`, so subsequent runs won't require re-login. 2. **Single instance only** β€” Only one browser instance can use a given CDP port. Running multiple instances will cause port conflicts. 3. **Windows Server considerations** β€” Path normalization and Safe Browsing bypass are built-in, but double-check: - Chrome/Edge is properly installed - The output directory is writable - The firewall is not blocking localhost traffic 4. **Image generation takes time** β€” Typically 60–120 seconds. Set your MCP client's `timeoutMs` to β‰₯ 180000 (3 minutes). ## πŸ“„ License ISC