Developer Docs

Everything you need to use computegpu.com programmatically.

API Keys

Two types of API keys:

PrefixScopeUse
gpu_Account-wideCLI login, REST API (rent, stop, list rentals)
(hex, no prefix)Per-rentalOllama proxy access (/api/v1/ollama/:rentalId/*)

Generate an account key from Settings → API Keys or via the API:

POST /auth/api-key
Content-Type: application/json
Cookie: (session)

{"label": "My CLI Key"}

CLI Tool

npm install -g computegpu
Setup
computegpu login gpu_your_api_key_here
Commands
computegpu modelsList GPU models with availability and pricing
computegpu search --gpu RTX4090Search available listings
computegpu rent <listingId>Rent a GPU, get API key + endpoint
computegpu rentalsList active rentals
computegpu status <rentalId>Rental usage stats
computegpu stop <rentalId>Stop and settle
Search flags
computegpu search \
  --gpu RTX4090 \
  --max-price 1.00 \
  --vram 24 \
  --model llama3 \
  --sort price

REST API

Base URL: https://computegpu.com/api/v1/marketplace

Public (no auth)
MethodEndpointDescription
GET/searchSearch listings. Params: gpu, maxPrice, minVram, ollamaModel, instanceType, pricingTier, sort, limit, offset
GET/gpu-modelsGPU catalogue summary
GET/listings/:idListing detail + health
Authenticated (Authorization: Bearer gpu_...)
MethodEndpointDescription
POST/rent/:listingIdRent a GPU. Returns rental ID + proxy API key
POST/rentals/:id/stopStop rental. Returns cost summary
GET/rentalsList your rentals. Param: status
GET/rentals/:idRental detail + usage
Example: Search + Rent
# Search
curl https://computegpu.com/api/v1/marketplace/search?gpu=RTX4090

# Rent
curl -X POST https://computegpu.com/api/v1/marketplace/rent/LISTING_ID \
  -H "Authorization: Bearer gpu_your_key"

# Use the GPU
curl https://computegpu.com/api/v1/ollama/RENTAL_ID/api/chat \
  -H "Authorization: Bearer RENTAL_API_KEY" \
  -d '{"model":"llama3","messages":[{"role":"user","content":"Hello"}]}'

# Stop
curl -X POST https://computegpu.com/api/v1/marketplace/rentals/RENTAL_ID/stop \
  -H "Authorization: Bearer gpu_your_key"

GPU Cloud API

Deploy, manage, and destroy cloud GPU instances programmatically. Supports single-GPU and multi-GPU (up to 8x) configurations.

Base URL: https://computegpu.com/api/v1/gpu

Public (no auth)
MethodEndpointDescription
GET/typesAll GPU types with live pricing. Returns pricePerHour, spotPricePerHour, vramGb
GET/cheapest?vram=24Find cheapest GPU for a VRAM requirement
GET/multi-gpuMulti-GPU configs with pricing per GPU count (1/2/4/8x), interconnect type, max count
Authenticated (Authorization: Bearer gpu_...)
MethodEndpointDescription
POST/deployDeploy on-demand GPU instance. Body: {"gpuType":"RTX 4090","gpuCount":1,"models":["llama3.3"],"instanceType":"ollama","volumeGb":50}
POST/deploy-spotDeploy spot GPU (50% off, preemptible). Body: {"gpuType":"RTX 4090","gpuCount":1,"bidPerGpu":0.25}
GET/podsList your active cloud GPU instances
GET/pods/:idInstance status + live billing
PUT/pods/:id/stopPause instance (billing pauses, volume kept)
PUT/pods/:id/resumeResume paused instance
DELETE/pods/:idDestroy instance permanently. Returns final cost
Example: Deploy + Use + Destroy
# Browse GPU types
curl https://computegpu.com/api/v1/gpu/types

# Deploy an RTX 4090 with Ollama
curl -X POST https://computegpu.com/api/v1/gpu/deploy \
  -H "Authorization: Bearer gpu_your_key" \
  -H "Content-Type: application/json" \
  -d '{"gpuType":"RTX 4090","models":["llama3.3","gemma4:e4b"]}'

# Check instance status
curl https://computegpu.com/api/v1/gpu/pods/POD_ID \
  -H "Authorization: Bearer gpu_your_key"

# Pause billing (keep volume)
curl -X PUT https://computegpu.com/api/v1/gpu/pods/POD_ID/stop \
  -H "Authorization: Bearer gpu_your_key"

# Resume
curl -X PUT https://computegpu.com/api/v1/gpu/pods/POD_ID/resume \
  -H "Authorization: Bearer gpu_your_key"

# Destroy (permanent, final billing)
curl -X DELETE https://computegpu.com/api/v1/gpu/pods/POD_ID \
  -H "Authorization: Bearer gpu_your_key"
Deploy response
{
  "instance": {
    "id": "68abc123def456",
    "gpuType": "RTX 4090",
    "vramGb": 24,
    "pricePerHour": 0.51,
    "instanceType": "ollama",
    "status": "deploying",
    "pricingTier": "on-demand",
    "startedAt": "2026-04-30T15:00:00.000Z"
  }
}
Access types

Set instanceType in the deploy body:

TypeWhat you get
ollamaOllama-compatible API endpoint via proxy. Default.
dockerDocker container with SSH + Jupyter access
sshFull shell access to the GPU machine

Serverless GPU API

Deploy inference endpoints that auto-scale to zero. Pay per second of compute, not per hour of uptime.

Base URL: https://computegpu.com/api/v1/serverless

Public (no auth)
MethodEndpointDescription
GET/templatesPre-built handler templates (Ollama, vLLM, SDXL, Whisper, ComfyUI)
GET/pricingGPU pricing per second (with margin)
Authenticated (Authorization: Bearer gpu_...)
MethodEndpointDescription
POST/endpointsCreate serverless endpoint. Body: {"name","templateId","gpuType","workersMax"}
GET/endpointsList your endpoints with live metrics
GET/endpoints/:idEndpoint detail + workers + job history
PATCH/endpoints/:idUpdate scaling config (workersMin, workersMax, idleTimeout)
DELETE/endpoints/:idDelete endpoint and all workers
POST/endpoints/:id/runSync inference (waits up to 30s for result)
POST/endpoints/:id/run-asyncAsync job (returns job ID, poll for result)
GET/jobs/:jobId?endpointId=...Poll job status + output
POST/endpoints/:id/cancel/:jobIdCancel queued/running job
Example: Deploy + Inference
# Create an Ollama serverless endpoint
curl -X POST https://computegpu.com/api/v1/serverless/endpoints \
  -H "Authorization: Bearer gpu_your_key" \
  -H "Content-Type: application/json" \
  -d '{"name":"llama3-api","templateId":"tpl_ollama","gpuType":"AMPERE_24","workersMax":3}'

# Run sync inference
curl -X POST https://computegpu.com/api/v1/serverless/endpoints/ENDPOINT_ID/run \
  -H "Authorization: Bearer gpu_your_key" \
  -H "Content-Type: application/json" \
  -d '{"input":{"prompt":"What is the capital of France?","max_tokens":100}}'

# Run async + poll
JOB_ID=$(curl -s -X POST .../endpoints/ENDPOINT_ID/run-async \
  -H "Authorization: Bearer gpu_your_key" \
  -d '{"input":{"prompt":"Explain quantum computing"}}' | jq -r '.jobId')

curl .../jobs/$JOB_ID?endpointId=ENDPOINT_ID \
  -H "Authorization: Bearer gpu_your_key"
Templates
TemplateUse CaseMin VRAM
Ollama ServerlessAny Ollama model (Llama, Qwen, Gemma, Mistral)8 GB
vLLM ServerlessHigh-throughput LLM serving (OpenAI-compatible)16 GB
SDXLImage generation (Stable Diffusion XL)12 GB
WhisperAudio transcription8 GB
ComfyUIComfyUI workflows12 GB
CustomBring your own Docker handler4 GB
Billing
  • Billed per second of compute time (not wall clock)
  • Zero cost when scaled to zero (no idle charges)
  • Wallet prepaid — deducted per job completion
  • Minimum $0.10 wallet balance to run jobs
  • Rate limit: 60 jobs/minute per user

GPU Clusters

Multi-node GPU clusters with InfiniBand interconnect. 2–32 instances per cluster, up to 8 GPUs per node.

Manage via the dashboard or programmatically. Cluster API coming soon — currently available via dashboard UI.

Configuration
SettingRange
Instance count2–32
GPUs per instance1–8
Max GPUs per cluster256
InterconnectInfiniBand (3,200 Gbps)
BillingPer-second per instance

Learn more about clusters →

Network Storage

Persistent network volumes that survive instance restarts. Store models, datasets, and checkpoints. Attach to any instance or cluster.

Manage via the dashboard or programmatically. Storage API coming soon — currently available via dashboard UI.

Features
  • Persistent across instance restarts and redeployments
  • Shared across multiple instances in a cluster
  • Resize anytime without data loss
  • Available in multiple regions

Wallet API

Prepaid credits for GPU rentals. Add funds via inline Stripe Elements (cards, Apple Pay, Google Pay).

Base URL: https://computegpu.com/wallet (session auth required)

MethodEndpointDescription
POST/wallet/create-intentCreate Stripe PaymentIntent. Body: {"amount":10}. Returns clientSecret
POST/wallet/confirmVerify payment + credit wallet. Body: {"paymentIntentId":"pi_..."}. Returns balance
Billing rules
  • Minimum top-up: $5
  • Must have ≥1 hour's GPU cost to start a rental
  • Pods: billed per second of uptime. Auto-stops at $0.
  • Serverless: billed per second of compute. $0 when idle.

Agent Signup

Programmatic provisioning for AI agents. Provide an email — we send the account owner a signup link. They create an account, add credits, and get an API key.

Base URL: https://computegpu.com/api/signup

MethodEndpointDescription
POST/api/signupSend signup invite. Body: {"email":"user@co.com","tenantId":"my-agent"}
GET/api/signup/plansAvailable plans + pricing
Example
# Request agent account (emails the human a signup link)
curl -X POST https://computegpu.com/api/signup \
  -H "Content-Type: application/json" \
  -d '{"email":"dev@company.com","tenantId":"my-agent","agent":"claude-code"}'

# Human clicks link → creates account → tops up wallet → gets API key
# Use the key for all GPU Cloud + Serverless operations
How it works
  • Agent calls POST /api/signup with an email + tenant ID
  • Account owner receives email with a signup link
  • Owner creates account, adds credits via credit card (minimum $5 top-up)
  • API key available on dashboard — share with your agent
  • Per-second billing. No free credit. Pay only for what you use.

τAI Chat API

AI-powered support chat. 14 domain experts for GPUs, billing, deployments, and more.

Base URL: https://computegpu.com/api/support

MethodEndpointDescription
POST/api/support/messageSend message to τAI. Body: {"message":"...","sessionId":"..."}
GET/api/support/history?sessionId=...Load chat history (max 50 messages, 24h TTL)
Example
curl -X POST https://computegpu.com/api/support/message \
  -H "Content-Type: application/json" \
  -d '{"message":"How much is an RTX 4090?","sessionId":"my-session"}'

# Returns: {"answer":"...","experts":["marketplace"],"fastPath":true}

Instance Types

Ollama API

Drop-in Ollama endpoint. Send requests to the proxy, get responses. Zero config.

Best for: LLM inference, chatbots, RAG pipelines

Docker

Run any Docker image on the host GPU. SSH access + optional Jupyter. Use templates for pre-built stacks.

Best for: Training, fine-tuning, custom workloads

SSH

Full shell access to the host machine. Bring your own tools, run anything.

Best for: Research, custom setups, multi-GPU workflows

Billing

Per-second billing

Everything is billed per second. Prices displayed as $/hr for readability. Stop after 47 seconds? Pay for 47 seconds.

Product billing
ProductBilled onIdle cost
GPU InstancesPer second of uptimeContinues until paused/destroyed
ServerlessPer second of compute$0 when scaled to zero
ClustersPer second per instanceContinues until destroyed
Pricing tiers
TierPriceGuarantee
On-DemandListed priceCannot be preempted
Spot~50% offCan be preempted (5 min grace)
Wallet
  • Prepaid credits — add via inline Stripe (cards, Apple Pay, Google Pay)
  • Minimum top-up: $5. Presets: $5, $10, $25, $50, $100
  • Instance auto-stops when balance hits $0
  • Serverless requires $0.10 minimum to run jobs

Templates

970+ pre-built templates for pods and serverless endpoints. Browse the Template Hub.

  • Serverless templates — Ollama, vLLM, SDXL, Whisper, ComfyUI, SGLang, Faster Whisper, and more
  • Instance templates — PyTorch, TensorFlow, JupyterLab, Ubuntu, and 940+ community templates

Each template specifies a Docker image, minimum VRAM, and access type (Jupyter/SSH). Pick one, choose a GPU, deploy.

Browse Template Hub

Open Source

ComputeGpu is built on open-source infrastructure and contributes back to the ecosystem.

Tyga.Cloud ecosystem
ProjectWhat it doesLicense
easytyga AI inference tunnel. Exposes local Ollama/vLLM to the internet with one command (npx easytyga). Auto GPU detection, built-in API key auth. Powers host connectivity on ComputeGpu. MIT
Agentic Memory Long-term memory for AI agents. Persistent conversation context across sessions. Powers the optional memory layer on GPU rentals. Proprietary
Projects we use and thank
ProjectHow we use it
Ollama The inference runtime that GPU hosts run. Our proxy speaks native Ollama API - every rental is an Ollama endpoint.
HuggingFace Model hub. Hosts pull GGUF models from the HuggingFace Hub to serve on their GPUs.
llama.cpp The C++ inference engine that Ollama wraps. Makes local GPU inference fast and efficient.
h-network LLN LLM-Native Notation by Halil Ibrahim Baysal. Compact context encoding that reduces memory token costs.
Help
Developer Docs

Jump to a section: