Developer Docs

API Keys

Two types of API keys:

Prefix	Scope	Use
`gpu_`	Account-wide	CLI login, REST API (rent, stop, list rentals)
(hex, no prefix)	Per-rental	Ollama proxy access (`/api/v1/ollama/:rentalId/*`)

Generate an account key from Settings → API Keys, via the CLI, or via the API:

CLI

computegpu login --key gpu_your_api_key_here

REST API

POST /auth/api-key
Content-Type: application/json
Cookie: (session)

{"label": "My CLI Key"}

CLI Tool

npm install -g computegpu

Setup

computegpu login gpu_your_api_key_here

Commands

`computegpu models`	List GPU models with availability and pricing
`computegpu search --gpu RTX4090`	Search available listings
`computegpu rent <listingId>`	Rent a GPU, get API key + endpoint
`computegpu rentals`	List active rentals
`computegpu status <rentalId>`	Rental usage stats
`computegpu stop <rentalId>`	Stop and settle

Search flags

computegpu search \
  --gpu RTX4090 \
  --max-price 1.00 \
  --vram 24 \
  --model llama3 \
  --sort price

REST API

Base URL: https://computegpu.com/api/v1/marketplace

Public (no auth)

Method	Endpoint	Description
GET	`/search`	Search listings. Params: `gpu`, `maxPrice`, `minVram`, `ollamaModel`, `instanceType`, `pricingTier`, `sort`, `limit`, `offset`
GET	`/gpu-models`	GPU catalogue summary
GET	`/listings/:id`	Listing detail + health

Authenticated (`Authorization: Bearer gpu_...`)

Method	Endpoint	Description
POST	`/rent/:listingId`	Rent a GPU. Returns rental ID + proxy API key
POST	`/rentals/:id/stop`	Stop rental. Returns cost summary
GET	`/rentals`	List your rentals. Param: `status`
GET	`/rentals/:id`	Rental detail + usage

CLI

computegpu types                   # Browse GPU models
computegpu cheapest --vram 24      # Find cheapest with 24GB+ VRAM

REST API: Search + Rent

# Search
curl https://computegpu.com/api/v1/marketplace/search?gpu=RTX4090

# Rent
curl -X POST https://computegpu.com/api/v1/marketplace/rent/LISTING_ID \
  -H "Authorization: Bearer gpu_your_key"

# Use the GPU
curl https://computegpu.com/api/v1/ollama/RENTAL_ID/api/chat \
  -H "Authorization: Bearer RENTAL_API_KEY" \
  -d '{"model":"llama3","messages":[{"role":"user","content":"Hello"}]}'

# Stop
curl -X POST https://computegpu.com/api/v1/marketplace/rentals/RENTAL_ID/stop \
  -H "Authorization: Bearer gpu_your_key"

GPU Cloud API

Deploy, manage, and destroy cloud GPU instances programmatically. Supports single-GPU and multi-GPU (up to 8x) configurations.

Base URL: https://computegpu.com/api/v1/gpu

Public (no auth)

Method	Endpoint	Description
GET	`/types`	All GPU types with live pricing. Returns `pricePerHour`, `spotPricePerHour`, `vramGb`
GET	`/cheapest?vram=24`	Find cheapest GPU for a VRAM requirement
GET	`/multi-gpu`	Multi-GPU configs with pricing per GPU count (1/2/4/8x), interconnect type, max count

Authenticated (`Authorization: Bearer gpu_...`)

Method	Endpoint	Description
POST	`/deploy`	Deploy on-demand GPU instance. Body: `{"gpuType":"RTX 4090","gpuCount":1,"models":["llama3.3"],"instanceType":"ollama","volumeGb":50}`
POST	`/deploy-spot`	Deploy spot GPU (50% off, preemptible). Body: `{"gpuType":"RTX 4090","gpuCount":1,"bidPerGpu":0.25}`
GET	`/pods`	List your active cloud GPU instances
GET	`/pods/:id`	Instance status + live billing
PUT	`/pods/:id/stop`	Pause instance (billing pauses, volume kept)
PUT	`/pods/:id/resume`	Resume paused instance
DELETE	`/pods/:id`	Destroy instance permanently. Returns final cost

CLI (recommended)

computegpu deploy --gpu "RTX 4090"    # Deploy instance
computegpu status POD_ID              # Check status
computegpu stop POD_ID                # Pause billing
computegpu resume POD_ID              # Resume
computegpu destroy POD_ID             # Permanent destroy

REST API: Deploy + Use + Destroy

# Browse GPU types
curl https://computegpu.com/api/v1/gpu/types

# Deploy an RTX 4090 with Ollama
curl -X POST https://computegpu.com/api/v1/gpu/deploy \
  -H "Authorization: Bearer gpu_your_key" \
  -H "Content-Type: application/json" \
  -d '{"gpuType":"RTX 4090","models":["llama3.3","gemma4:e4b"]}'

# Check instance status
curl https://computegpu.com/api/v1/gpu/pods/POD_ID \
  -H "Authorization: Bearer gpu_your_key"

# Pause billing (keep volume)
curl -X PUT https://computegpu.com/api/v1/gpu/pods/POD_ID/stop \
  -H "Authorization: Bearer gpu_your_key"

# Resume
curl -X PUT https://computegpu.com/api/v1/gpu/pods/POD_ID/resume \
  -H "Authorization: Bearer gpu_your_key"

# Destroy (permanent, final billing)
curl -X DELETE https://computegpu.com/api/v1/gpu/pods/POD_ID \
  -H "Authorization: Bearer gpu_your_key"

Deploy response

{
  "instance": {
    "id": "68abc123def456",
    "gpuType": "RTX 4090",
    "vramGb": 24,
    "pricePerHour": 0.51,
    "instanceType": "ollama",
    "status": "deploying",
    "pricingTier": "on-demand",
    "startedAt": "2026-04-30T15:00:00.000Z"
  }
}

Access types

Set instanceType in the deploy body:

Type	What you get
`ollama`	Ollama-compatible API endpoint via proxy. Default.
`docker`	Docker container with SSH + Jupyter access
`ssh`	Full shell access to the GPU machine

Serverless GPU API

Deploy inference endpoints that auto-scale to zero. Pay per second of compute, not per hour of uptime.

Base URL: https://computegpu.com/api/v1/serverless

Public (no auth)

Method	Endpoint	Description
GET	`/templates`	Pre-built handler templates (Ollama, vLLM, SDXL, Whisper, ComfyUI)
GET	`/pricing`	GPU pricing per second (with margin)

Authenticated (`Authorization: Bearer gpu_...`)

Method	Endpoint	Description
POST	`/endpoints`	Create serverless endpoint. Body: `{"name","templateId","gpuType","workersMax"}`
GET	`/endpoints`	List your endpoints with live metrics
GET	`/endpoints/:id`	Endpoint detail + workers + job history
PATCH	`/endpoints/:id`	Update scaling config (workersMin, workersMax, idleTimeout)
DELETE	`/endpoints/:id`	Delete endpoint and all workers
POST	`/endpoints/:id/run`	Sync inference (waits up to 30s for result)
POST	`/endpoints/:id/run-async`	Async job (returns job ID, poll for result)
GET	`/jobs/:jobId?endpointId=...`	Poll job status + output
POST	`/endpoints/:id/cancel/:jobId`	Cancel queued/running job

Example: Deploy + Inference

# Create an Ollama serverless endpoint
curl -X POST https://computegpu.com/api/v1/serverless/endpoints \
  -H "Authorization: Bearer gpu_your_key" \
  -H "Content-Type: application/json" \
  -d '{"name":"llama3-api","templateId":"tpl_ollama","gpuType":"AMPERE_24","workersMax":3}'

# Run sync inference
curl -X POST https://computegpu.com/api/v1/serverless/endpoints/ENDPOINT_ID/run \
  -H "Authorization: Bearer gpu_your_key" \
  -H "Content-Type: application/json" \
  -d '{"input":{"prompt":"What is the capital of France?","max_tokens":100}}'

# Run async + poll
JOB_ID=$(curl -s -X POST .../endpoints/ENDPOINT_ID/run-async \
  -H "Authorization: Bearer gpu_your_key" \
  -d '{"input":{"prompt":"Explain quantum computing"}}' | jq -r '.jobId')

curl .../jobs/$JOB_ID?endpointId=ENDPOINT_ID \
  -H "Authorization: Bearer gpu_your_key"

Templates

Template	Use Case	Min VRAM
Ollama Serverless	Any Ollama model (Llama, Qwen, Gemma, Mistral)	8 GB
vLLM Serverless	High-throughput LLM serving (OpenAI-compatible)	16 GB
SDXL	Image generation (Stable Diffusion XL)	12 GB
Whisper	Audio transcription	8 GB
ComfyUI	ComfyUI workflows	12 GB
Custom	Bring your own Docker handler	4 GB

Billing

Billed per second of compute time (not wall clock)
Zero cost when scaled to zero (no idle charges)
Wallet prepaid — deducted per job completion
Minimum $0.10 wallet balance to run jobs
Rate limit: 60 jobs/minute per user

GPU Clusters

Multi-node GPU clusters with InfiniBand interconnect. 2–32 instances per cluster, up to 8 GPUs per node.

Manage via the dashboard or programmatically. Cluster API coming soon — currently available via dashboard UI.

Configuration

Setting	Range
Instance count	2–32
GPUs per instance	1–8
Max GPUs per cluster	256
Interconnect	InfiniBand (3,200 Gbps)
Billing	Per-second per instance

Learn more about clusters →

Network Storage

Persistent network volumes that survive instance restarts. Store models, datasets, and checkpoints. Attach to any instance or cluster.

Manage via the dashboard or programmatically. Storage API coming soon — currently available via dashboard UI.

Features

Persistent across instance restarts and redeployments
Shared across multiple instances in a cluster
Resize anytime without data loss
Available in multiple regions

Wallet API

Prepaid credits for GPU rentals. Add funds via inline Stripe Elements (cards, Apple Pay, Google Pay).

Base URL: https://computegpu.com/wallet (session auth required)

Method	Endpoint	Description
POST	`/wallet/create-intent`	Create Stripe PaymentIntent. Body: `{"amount":10}`. Returns `clientSecret`
POST	`/wallet/confirm`	Verify payment + credit wallet. Body: `{"paymentIntentId":"pi_..."}`. Returns `balance`

Billing rules

Minimum top-up: $5
Must have ≥1 hour's GPU cost to start a rental
Pods: billed per second of uptime. Auto-stops at $0.
Serverless: billed per second of compute. $0 when idle.

Agent Onboarding

Three commands. Zero config. Your agent gets a GPU in 30 seconds.

Quick start

npm i -g computegpu
computegpu signup dev@company.com my-agent   # → gpu_ API key instantly
computegpu deploy --gpu "RTX 4090"           # → GPU running in ~30s

Full agent lifecycle

computegpu types                     # Browse 47 GPU models
computegpu cheapest --vram 24        # Find cheapest for VRAM
computegpu deploy --gpu "RTX 4090"   # Deploy instance
computegpu pods                      # List active instances
computegpu status POD_ID             # Check status + billing
computegpu stop POD_ID               # Pause (billing stops)
computegpu destroy POD_ID            # Permanent destroy
computegpu balance                   # Check wallet
computegpu topup 10                  # Request $10 top-up

MCP (Claude Code / Cursor)

# Add to .mcp.json:
{
  "mcpServers": {
    "computegpu": {
      "type": "stdio",
      "command": "computegpu",
      "args": ["mcp-serve"]
    }
  }
}

How it works

computegpu signup creates an account + returns a gpu_ API key instantly
Account owner receives email — verifies with $1 (credited to wallet as starting balance)
Agent can now deploy, stop, destroy GPUs via CLI, MCP, or REST API
Per-second billing. $1 starting credit. Stop anytime.

REST API (programmatic alternative)

Method	Endpoint	Description
POST	`/api/signup`	Create account. Body: `{"email":"user@co.com","tenantId":"my-agent"}`
GET	`/api/signup/plans`	Available plans + pricing

τAI Chat API

AI-powered support chat. 14 domain experts for GPUs, billing, deployments, and more.

Base URL: https://computegpu.com/api/support

Method	Endpoint	Description
POST	`/api/support/message`	Send message to τAI. Body: `{"message":"...","sessionId":"..."}`
GET	`/api/support/history?sessionId=...`	Load chat history (max 50 messages, 24h TTL)

Example

curl -X POST https://computegpu.com/api/support/message \
  -H "Content-Type: application/json" \
  -d '{"message":"How much is an RTX 4090?","sessionId":"my-session"}'

# Returns: {"answer":"...","experts":["marketplace"],"fastPath":true}

Instance Types

Ollama API

Drop-in Ollama endpoint. Send requests to the proxy, get responses. Zero config.

Best for: LLM inference, chatbots, RAG pipelines

Docker

Run any Docker image on the host GPU. SSH access + optional Jupyter. Use templates for pre-built stacks.

Best for: Training, fine-tuning, custom workloads

SSH

Full shell access to the host machine. Bring your own tools, run anything.

Best for: Research, custom setups, multi-GPU workflows

Billing

Per-second billing

Everything is billed per second. Prices displayed as $/hr for readability. Stop after 47 seconds? Pay for 47 seconds.

Product billing

Product	Billed on	Idle cost
GPU Instances	Per second of uptime	Continues until paused/destroyed
Serverless	Per second of compute	$0 when scaled to zero
Clusters	Per second per instance	Continues until destroyed

Pricing tiers

Tier	Price	Guarantee
On-Demand	Listed price	Cannot be preempted
Spot	~50% off	Can be preempted (5 min grace)

Wallet

Prepaid credits — add via inline Stripe (cards, Apple Pay, Google Pay)
Minimum top-up: $5. Presets: $5, $10, $25, $50, $100
Instance auto-stops when balance hits $0
Serverless requires $0.10 minimum to run jobs

Templates

970+ pre-built templates for pods and serverless endpoints. Browse the Template Hub.

Serverless templates — Ollama, vLLM, SDXL, Whisper, ComfyUI, SGLang, Faster Whisper, and more
Instance templates — PyTorch, TensorFlow, JupyterLab, Ubuntu, and 940+ community templates

Each template specifies a Docker image, minimum VRAM, and access type (Jupyter/SSH). Pick one, choose a GPU, deploy.

Browse Template Hub

Open Source

ComputeGpu is built on open-source infrastructure and contributes back to the ecosystem.

Tyga.Cloud ecosystem

Project	What it does	License
easytyga	AI inference tunnel. Exposes local Ollama/vLLM to the internet with one command (`npx easytyga`). Auto GPU detection, built-in API key auth. Powers host connectivity on ComputeGpu.	MIT
Agentic Memory	Long-term memory for AI agents. Persistent conversation context across sessions. Powers the optional memory layer on GPU rentals.	Proprietary

Projects we use and thank

Project	How we use it
Ollama	The inference runtime that GPU hosts run. Our proxy speaks native Ollama API - every rental is an Ollama endpoint.
HuggingFace	Model hub. Hosts pull GGUF models from the HuggingFace Hub to serve on their GPUs.
llama.cpp	The C++ inference engine that Ollama wraps. Makes local GPU inference fast and efficient.
h-network LLN	LLM-Native Notation by Halil Ibrahim Baysal. Compact context encoding that reduces memory token costs.

API Keys

CLI

REST API

CLI Tool

Setup

Commands

Search flags

REST API

Public (no auth)

Authenticated (Authorization: Bearer gpu_...)

CLI

REST API: Search + Rent

GPU Cloud API

Public (no auth)

Authenticated (Authorization: Bearer gpu_...)

CLI (recommended)

REST API: Deploy + Use + Destroy

Deploy response

Access types

Serverless GPU API

Public (no auth)

Authenticated (Authorization: Bearer gpu_...)

Example: Deploy + Inference

Templates

Billing

GPU Clusters

Configuration

Network Storage

Features

Wallet API

Billing rules

Agent Onboarding

Quick start

Full agent lifecycle

MCP (Claude Code / Cursor)

How it works

REST API (programmatic alternative)

τAI Chat API

Example

Instance Types

Ollama API

Docker

SSH

Billing

Per-second billing

Product billing

Pricing tiers

Wallet

Templates

Open Source

Tyga.Cloud ecosystem

Projects we use and thank

Help

Developer Docs

Authenticated (`Authorization: Bearer gpu_...`)

Authenticated (`Authorization: Bearer gpu_...`)

Authenticated (`Authorization: Bearer gpu_...`)