Not live yet. The most beautiful coding agent ever made is almost here — and it's free.Join the waitlist →
Goal

Give it the outcome.

Picasso keeps the objective alive across turns, tools, tests, failures, and approvals until the criteria pass or a boundary is reached.

picasso
goal
picassopicasso-platform/apigoal · migrate-to-argon2idopus 4.7 · plan │ sonnet 4.7 · actgoal
Migrate auth module to argon2id, tests green.runninggoal · gl_01HVT...zR2 · started 3h 22m ago
Cost
$1.84/ $5.00
Steps
287/ 1000
Wall time
3h 22m/ 24h
Completion
62%2 / 3 criteria
iter 48 · re-plan in 12 steps
now
Refactoring src/auth/session.ts to call the new hashing API.
str_replacesrc/auth/session.tsok · 14ms
88const hash = await bcrypt.hash(password, 12);88+const hash = await argon2.hash(password);
bashpnpm test src/auth/session.test.tsstreaming · 7s
RUN v1.6.0 /api
verify · accepts current session (8 ms)
verify · rejects expired session (3 ms)
hash · argon2id round-trip (112 ms)
· running migration compat suite
memory writes · this run
decision
Bridge bcrypt verifies for 90 days, then auto-migrate on next login.
fact
Auth reference cases hold pre-migration hashes.
decision
Drop bcrypt dependency after final migration window.
style
Use argon2.verify over manual hash compare. Reinforced.
goal/argon2idtools287ctx184K / 200Kcost$1.84modegoaliter 48 · acting

The contract.

Every goal starts with the work, proof, limits, permissions, budgets, and stop conditions. A goal is not a loose prompt. It is a bounded agreement about what done means and when Picasso must pause.

agent loopplan / act / observe / reflect
01Planread repo, draft plan
02Actedit files, run tools
03Observetests, logs, diffs
04Reflectcontinue or stop
completion criteria
objective: Migrate all .js files to .ts
completionCriteria: [all-tests-pass, lint-clean]
budget: { dollars: 5, hours: 4, steps: 1000 }
scope: packages/*
permissions: safe

The loop.

Plan.

Build the next stage from the objective, completion criteria, retrieved memory, and current workspace state.

Act.

Run edits, tools, tests, MCP calls, and subagents inside permission, sandbox, and budget limits.

Observe.

Read test results, diffs, logs, build output, provider responses, cache state, and failures.

Assess.

Continue, re-plan, pause, ask for approval, or stop against the contract.

It doesn't grade its own homework.

Every goal needs proof a machine can check — tests that pass, commands that exit clean, a score that must move. "Looks done" is not done. Picasso critiques the contract before you approve it, runs the baseline before the loop starts, and when the loop says finished, a final audit re-runs every criterion fresh and checks every deliverable against the working tree. Anything less gets sent back to work.

The oracle gate.

A goal without a verifiable finish line is refused, not attempted. Self-assessment alone never closes a goal.

Pre-flight.

The baseline runs before the work. A red baseline stops the goal before it spends a single token.

The final audit.

Every machine criterion re-verified fresh. Every deliverable checked on disk. And a visible coverage score for everything a test can't prove.

No debris.

Debug prints, dead imports, and stray TODOs are flagged before a goal may call itself complete.

Mistakes get caught mid-stroke.

An edit that does not parse is rejected and reverted the moment it is written, with the error handed straight back to the agent. And every iteration that resolves a blocker writes the lesson to memory — so the next goal starts smarter than the last.

Edit guardrail.

Unparseable code never lands. The bad write rolls back; the agent learns why.

A score that must move.

Numeric fitness criteria capture a baseline at pre-flight and demand measurable improvement — in the right direction, by the margin you set.

Memory writeback.

Resolved blockers become memory. Goals compound.

Human control stays live.

A running goal can be paused, resumed, cancelled, inspected, or replayed. Permission mode changes, approval requirements, exhausted budgets, missing credentials, and risky mutations stop the loop instead of silently pushing through.

Pause and resume.

Stop the loop without losing the objective or evidence.

Cancel.

End the goal with cancellation state and audit trail intact.

Approve.

Sensitive steps require the operator proof the policy demands.

Complete.

The goal closes only when completion criteria are satisfied.

Evidence, not vibes.

The TUI and Mac app show goal progress from the real backend ledger: provider calls, tool calls, command output, step state, budgets, cache events, memory writes, subagent results, and completion checks.

Studiomemory / plans / goals / replay
memorysketchescanvasesgoals
memoryauth module uses argon2id
plan3 files, 2 risks, 4 checks
goal62 percent complete
replay48 strokes recorded

Good goals.

Long refactors.

Move a shared API, update callers, and keep tests green across packages.

Backlog cleanup.

Work through a queue of scoped tasks, each verified before the next begins.

Feature completion.

Turn a product spec into staged implementation, tests, and final review.

Migrations.

Move frameworks, languages, or dependencies with repeatable checks at each stage.

Cross-provider.

The same goal contract can run through hosted, managed, BYO-key, subscription, or local routes. Anthropic, OpenAI, Microsoft Foundry, Google, xAI, Qwen, Xiaomi MiMo, Kimi, DeepSeek, Zhipu GLM, MiniMax, OpenRouter, Ollama, and vLLM stay behind the same harness.

model accessfour paths
Managedno setupsponsor-supported
Claude / ChatGPTsubscriptionpassthrough
Provider keysBYOdirect billing
LocalOllama / vLLMyour machine

Budgets and transparency.

Picasso tracks per-session, per-day, per-goal, sponsor, provider, alias, and route budgets, pauses when exhausted, and keeps the status line honest. Autonomy stays useful because the limit is visible.

The waitlist
Code Freely.

Picasso for Mac is almost here — a coding agent that looks the way serious tools should, and costs what creative freedom should: nothing. Leave your email and be first on the canvas.

Not live yet. Free for developers when it is — that's the point.

Sponsors and labs — the early canvas is yours. Choose Sponsor or Lab above and we'll reach out before launch.