Developer Docs
Everything you need to use computegpu.com programmatically.
API Keys
Two types of API keys:
| Prefix | Scope | Use |
|---|---|---|
gpu_ | Account-wide | CLI login, REST API (rent, stop, list rentals) |
| (hex, no prefix) | Per-rental | Ollama proxy access (/api/v1/ollama/:rentalId/*) |
Generate an account key from Settings → API Keys or via the API:
POST /auth/api-key
Content-Type: application/json
Cookie: (session)
{"label": "My CLI Key"}
CLI Tool
npm install -g computegpu
Setup
computegpu login gpu_your_api_key_here
Commands
computegpu models | List GPU models with availability and pricing |
computegpu search --gpu RTX4090 | Search available listings |
computegpu rent <listingId> | Rent a GPU, get API key + endpoint |
computegpu rentals | List active rentals |
computegpu status <rentalId> | Rental usage stats |
computegpu stop <rentalId> | Stop and settle |
Search flags
computegpu search \
--gpu RTX4090 \
--max-price 1.00 \
--vram 24 \
--model llama3 \
--sort price
REST API
Base URL: https://computegpu.com/api/v1/marketplace
Public (no auth)
| Method | Endpoint | Description |
|---|---|---|
| GET | /search | Search listings. Params: gpu, maxPrice, minVram, ollamaModel, instanceType, pricingTier, sort, limit, offset |
| GET | /gpu-models | GPU catalogue summary |
| GET | /listings/:id | Listing detail + health |
Authenticated (Authorization: Bearer gpu_...)
| Method | Endpoint | Description |
|---|---|---|
| POST | /rent/:listingId | Rent a GPU. Returns rental ID + proxy API key |
| POST | /rentals/:id/stop | Stop rental. Returns cost summary |
| GET | /rentals | List your rentals. Param: status |
| GET | /rentals/:id | Rental detail + usage |
Example: Search + Rent
# Search
curl https://computegpu.com/api/v1/marketplace/search?gpu=RTX4090
# Rent
curl -X POST https://computegpu.com/api/v1/marketplace/rent/LISTING_ID \
-H "Authorization: Bearer gpu_your_key"
# Use the GPU
curl https://computegpu.com/api/v1/ollama/RENTAL_ID/api/chat \
-H "Authorization: Bearer RENTAL_API_KEY" \
-d '{"model":"llama3","messages":[{"role":"user","content":"Hello"}]}'
# Stop
curl -X POST https://computegpu.com/api/v1/marketplace/rentals/RENTAL_ID/stop \
-H "Authorization: Bearer gpu_your_key"
GPU Cloud API
Deploy, manage, and destroy cloud GPU instances programmatically. Supports single-GPU and multi-GPU (up to 8x) configurations.
Base URL: https://computegpu.com/api/v1/gpu
Public (no auth)
| Method | Endpoint | Description |
|---|---|---|
| GET | /types | All GPU types with live pricing. Returns pricePerHour, spotPricePerHour, vramGb |
| GET | /cheapest?vram=24 | Find cheapest GPU for a VRAM requirement |
| GET | /multi-gpu | Multi-GPU configs with pricing per GPU count (1/2/4/8x), interconnect type, max count |
Authenticated (Authorization: Bearer gpu_...)
| Method | Endpoint | Description |
|---|---|---|
| POST | /deploy | Deploy on-demand GPU instance. Body: {"gpuType":"RTX 4090","gpuCount":1,"models":["llama3.3"],"instanceType":"ollama","volumeGb":50} |
| POST | /deploy-spot | Deploy spot GPU (50% off, preemptible). Body: {"gpuType":"RTX 4090","gpuCount":1,"bidPerGpu":0.25} |
| GET | /pods | List your active cloud GPU instances |
| GET | /pods/:id | Instance status + live billing |
| PUT | /pods/:id/stop | Pause instance (billing pauses, volume kept) |
| PUT | /pods/:id/resume | Resume paused instance |
| DELETE | /pods/:id | Destroy instance permanently. Returns final cost |
Example: Deploy + Use + Destroy
# Browse GPU types
curl https://computegpu.com/api/v1/gpu/types
# Deploy an RTX 4090 with Ollama
curl -X POST https://computegpu.com/api/v1/gpu/deploy \
-H "Authorization: Bearer gpu_your_key" \
-H "Content-Type: application/json" \
-d '{"gpuType":"RTX 4090","models":["llama3.3","gemma4:e4b"]}'
# Check instance status
curl https://computegpu.com/api/v1/gpu/pods/POD_ID \
-H "Authorization: Bearer gpu_your_key"
# Pause billing (keep volume)
curl -X PUT https://computegpu.com/api/v1/gpu/pods/POD_ID/stop \
-H "Authorization: Bearer gpu_your_key"
# Resume
curl -X PUT https://computegpu.com/api/v1/gpu/pods/POD_ID/resume \
-H "Authorization: Bearer gpu_your_key"
# Destroy (permanent, final billing)
curl -X DELETE https://computegpu.com/api/v1/gpu/pods/POD_ID \
-H "Authorization: Bearer gpu_your_key"
Deploy response
{
"instance": {
"id": "68abc123def456",
"gpuType": "RTX 4090",
"vramGb": 24,
"pricePerHour": 0.51,
"instanceType": "ollama",
"status": "deploying",
"pricingTier": "on-demand",
"startedAt": "2026-04-30T15:00:00.000Z"
}
}
Access types
Set instanceType in the deploy body:
| Type | What you get |
|---|---|
ollama | Ollama-compatible API endpoint via proxy. Default. |
docker | Docker container with SSH + Jupyter access |
ssh | Full shell access to the GPU machine |
Serverless GPU API
Deploy inference endpoints that auto-scale to zero. Pay per second of compute, not per hour of uptime.
Base URL: https://computegpu.com/api/v1/serverless
Public (no auth)
| Method | Endpoint | Description |
|---|---|---|
| GET | /templates | Pre-built handler templates (Ollama, vLLM, SDXL, Whisper, ComfyUI) |
| GET | /pricing | GPU pricing per second (with margin) |
Authenticated (Authorization: Bearer gpu_...)
| Method | Endpoint | Description |
|---|---|---|
| POST | /endpoints | Create serverless endpoint. Body: {"name","templateId","gpuType","workersMax"} |
| GET | /endpoints | List your endpoints with live metrics |
| GET | /endpoints/:id | Endpoint detail + workers + job history |
| PATCH | /endpoints/:id | Update scaling config (workersMin, workersMax, idleTimeout) |
| DELETE | /endpoints/:id | Delete endpoint and all workers |
| POST | /endpoints/:id/run | Sync inference (waits up to 30s for result) |
| POST | /endpoints/:id/run-async | Async job (returns job ID, poll for result) |
| GET | /jobs/:jobId?endpointId=... | Poll job status + output |
| POST | /endpoints/:id/cancel/:jobId | Cancel queued/running job |
Example: Deploy + Inference
# Create an Ollama serverless endpoint
curl -X POST https://computegpu.com/api/v1/serverless/endpoints \
-H "Authorization: Bearer gpu_your_key" \
-H "Content-Type: application/json" \
-d '{"name":"llama3-api","templateId":"tpl_ollama","gpuType":"AMPERE_24","workersMax":3}'
# Run sync inference
curl -X POST https://computegpu.com/api/v1/serverless/endpoints/ENDPOINT_ID/run \
-H "Authorization: Bearer gpu_your_key" \
-H "Content-Type: application/json" \
-d '{"input":{"prompt":"What is the capital of France?","max_tokens":100}}'
# Run async + poll
JOB_ID=$(curl -s -X POST .../endpoints/ENDPOINT_ID/run-async \
-H "Authorization: Bearer gpu_your_key" \
-d '{"input":{"prompt":"Explain quantum computing"}}' | jq -r '.jobId')
curl .../jobs/$JOB_ID?endpointId=ENDPOINT_ID \
-H "Authorization: Bearer gpu_your_key"
Templates
| Template | Use Case | Min VRAM |
|---|---|---|
| Ollama Serverless | Any Ollama model (Llama, Qwen, Gemma, Mistral) | 8 GB |
| vLLM Serverless | High-throughput LLM serving (OpenAI-compatible) | 16 GB |
| SDXL | Image generation (Stable Diffusion XL) | 12 GB |
| Whisper | Audio transcription | 8 GB |
| ComfyUI | ComfyUI workflows | 12 GB |
| Custom | Bring your own Docker handler | 4 GB |
Billing
- Billed per second of compute time (not wall clock)
- Zero cost when scaled to zero (no idle charges)
- Wallet prepaid — deducted per job completion
- Minimum $0.10 wallet balance to run jobs
- Rate limit: 60 jobs/minute per user
GPU Clusters
Multi-node GPU clusters with InfiniBand interconnect. 2–32 instances per cluster, up to 8 GPUs per node.
Manage via the dashboard or programmatically. Cluster API coming soon — currently available via dashboard UI.
Configuration
| Setting | Range |
|---|---|
| Instance count | 2–32 |
| GPUs per instance | 1–8 |
| Max GPUs per cluster | 256 |
| Interconnect | InfiniBand (3,200 Gbps) |
| Billing | Per-second per instance |
Network Storage
Persistent network volumes that survive instance restarts. Store models, datasets, and checkpoints. Attach to any instance or cluster.
Manage via the dashboard or programmatically. Storage API coming soon — currently available via dashboard UI.
Features
- Persistent across instance restarts and redeployments
- Shared across multiple instances in a cluster
- Resize anytime without data loss
- Available in multiple regions
Wallet API
Prepaid credits for GPU rentals. Add funds via inline Stripe Elements (cards, Apple Pay, Google Pay).
Base URL: https://computegpu.com/wallet (session auth required)
| Method | Endpoint | Description |
|---|---|---|
| POST | /wallet/create-intent | Create Stripe PaymentIntent. Body: {"amount":10}. Returns clientSecret |
| POST | /wallet/confirm | Verify payment + credit wallet. Body: {"paymentIntentId":"pi_..."}. Returns balance |
Billing rules
- Minimum top-up: $5
- Must have ≥1 hour's GPU cost to start a rental
- Pods: billed per second of uptime. Auto-stops at $0.
- Serverless: billed per second of compute. $0 when idle.
Agent Signup
Programmatic provisioning for AI agents. Provide an email — we send the account owner a signup link. They create an account, add credits, and get an API key.
Base URL: https://computegpu.com/api/signup
| Method | Endpoint | Description |
|---|---|---|
| POST | /api/signup | Send signup invite. Body: {"email":"user@co.com","tenantId":"my-agent"} |
| GET | /api/signup/plans | Available plans + pricing |
Example
# Request agent account (emails the human a signup link)
curl -X POST https://computegpu.com/api/signup \
-H "Content-Type: application/json" \
-d '{"email":"dev@company.com","tenantId":"my-agent","agent":"claude-code"}'
# Human clicks link → creates account → tops up wallet → gets API key
# Use the key for all GPU Cloud + Serverless operations
How it works
- Agent calls
POST /api/signupwith an email + tenant ID - Account owner receives email with a signup link
- Owner creates account, adds credits via credit card (minimum $5 top-up)
- API key available on dashboard — share with your agent
- Per-second billing. No free credit. Pay only for what you use.
τAI Chat API
AI-powered support chat. 14 domain experts for GPUs, billing, deployments, and more.
Base URL: https://computegpu.com/api/support
| Method | Endpoint | Description |
|---|---|---|
| POST | /api/support/message | Send message to τAI. Body: {"message":"...","sessionId":"..."} |
| GET | /api/support/history?sessionId=... | Load chat history (max 50 messages, 24h TTL) |
Example
curl -X POST https://computegpu.com/api/support/message \
-H "Content-Type: application/json" \
-d '{"message":"How much is an RTX 4090?","sessionId":"my-session"}'
# Returns: {"answer":"...","experts":["marketplace"],"fastPath":true}
Instance Types
Ollama API
Drop-in Ollama endpoint. Send requests to the proxy, get responses. Zero config.
Best for: LLM inference, chatbots, RAG pipelines
Docker
Run any Docker image on the host GPU. SSH access + optional Jupyter. Use templates for pre-built stacks.
Best for: Training, fine-tuning, custom workloads
SSH
Full shell access to the host machine. Bring your own tools, run anything.
Best for: Research, custom setups, multi-GPU workflows
Billing
Per-second billing
Everything is billed per second. Prices displayed as $/hr for readability. Stop after 47 seconds? Pay for 47 seconds.
Product billing
| Product | Billed on | Idle cost |
|---|---|---|
| GPU Instances | Per second of uptime | Continues until paused/destroyed |
| Serverless | Per second of compute | $0 when scaled to zero |
| Clusters | Per second per instance | Continues until destroyed |
Pricing tiers
| Tier | Price | Guarantee |
|---|---|---|
| On-Demand | Listed price | Cannot be preempted |
| Spot | ~50% off | Can be preempted (5 min grace) |
Wallet
- Prepaid credits — add via inline Stripe (cards, Apple Pay, Google Pay)
- Minimum top-up: $5. Presets: $5, $10, $25, $50, $100
- Instance auto-stops when balance hits $0
- Serverless requires $0.10 minimum to run jobs
Templates
970+ pre-built templates for pods and serverless endpoints. Browse the Template Hub.
- Serverless templates — Ollama, vLLM, SDXL, Whisper, ComfyUI, SGLang, Faster Whisper, and more
- Instance templates — PyTorch, TensorFlow, JupyterLab, Ubuntu, and 940+ community templates
Each template specifies a Docker image, minimum VRAM, and access type (Jupyter/SSH). Pick one, choose a GPU, deploy.
Open Source
ComputeGpu is built on open-source infrastructure and contributes back to the ecosystem.
Tyga.Cloud ecosystem
| Project | What it does | License |
|---|---|---|
| easytyga | AI inference tunnel. Exposes local Ollama/vLLM to the internet with one command (npx easytyga). Auto GPU detection, built-in API key auth. Powers host connectivity on ComputeGpu. |
MIT |
| Agentic Memory | Long-term memory for AI agents. Persistent conversation context across sessions. Powers the optional memory layer on GPU rentals. | Proprietary |
Projects we use and thank
| Project | How we use it |
|---|---|
| Ollama | The inference runtime that GPU hosts run. Our proxy speaks native Ollama API - every rental is an Ollama endpoint. |
| HuggingFace | Model hub. Hosts pull GGUF models from the HuggingFace Hub to serve on their GPUs. |
| llama.cpp | The C++ inference engine that Ollama wraps. Makes local GPU inference fast and efficient. |
| h-network LLN | LLM-Native Notation by Halil Ibrahim Baysal. Compact context encoding that reduces memory token costs. |