Goal

Give it the outcome.

Picasso keeps the objective alive across turns, tools, tests, failures, and approvals until the criteria pass or a boundary is reached.

Run a goal.

picasso

goal

picasso›picasso-platform/api›goal · migrate-to-argon2idopus 4.7 · plan │ sonnet 4.7 · actgoal

Migrate auth module to argon2id, tests green.runninggoal · gl_01HVT...zR2 · started 3h 22m ago

Cost

$1.84/ $5.00

Steps

287/ 1000

Wall time

3h 22m/ 24h

Completion

62%2 / 3 criteria

iter 48 · re-plan in 12 steps

now

Refactoring src/auth/session.ts to call the new hashing API.

▾str_replacesrc/auth/session.tsok · 14ms

88−const hash = await bcrypt.hash(password, 12);88+const hash = await argon2.hash(password);

bashpnpm test src/auth/session.test.tsstreaming · 7s

RUN v1.6.0 /api
✓ verify · accepts current session (8 ms)
✓ verify · rejects expired session (3 ms)
✓ hash · argon2id round-trip (112 ms)
· running migration compat suite

memory writes · this run

decision

Bridge bcrypt verifies for 90 days, then auto-migrate on next login.

fact

Auth reference cases hold pre-migration hashes.

decision

Drop bcrypt dependency after final migration window.

style

Use argon2.verify over manual hash compare. Reinforced.

∠goal/argon2idtools287ctx184K / 200Kcost$1.84modegoaliter 48 · acting

The contract.

Every goal starts with the work, proof, limits, permissions, budgets, and stop conditions. A goal is not a loose prompt. It is a bounded agreement about what done means and when Picasso must pause.

agent loopplan / act / observe / reflect

01Planread repo, draft plan

02Actedit files, run tools

03Observetests, logs, diffs

04Reflectcontinue or stop

completion criteria

objective: Migrate all .js files to .ts

completionCriteria: [all-tests-pass, lint-clean]

budget: { dollars: 5, hours: 4, steps: 1000 }

scope: packages/*

permissions: safe

The loop.

Plan.

Build the next stage from the objective, completion criteria, retrieved memory, and current workspace state.

Act.

Run edits, tools, tests, MCP calls, and subagents inside permission, sandbox, and budget limits.

Observe.

Read test results, diffs, logs, build output, provider responses, cache state, and failures.

Assess.

Continue, re-plan, pause, ask for approval, or stop against the contract.

It doesn't grade its own homework.

Every goal needs proof a machine can check — tests that pass, commands that exit clean, a score that must move. "Looks done" is not done. Picasso critiques the contract before you approve it, runs the baseline before the loop starts, and when the loop says finished, a final audit re-runs every criterion fresh and checks every deliverable against the working tree. Anything less gets sent back to work.

The oracle gate.

A goal without a verifiable finish line is refused, not attempted. Self-assessment alone never closes a goal.

Pre-flight.

The baseline runs before the work. A red baseline stops the goal before it spends a single token.

The final audit.

Every machine criterion re-verified fresh. Every deliverable checked on disk. And a visible coverage score for everything a test can't prove.

No debris.

Debug prints, dead imports, and stray TODOs are flagged before a goal may call itself complete.

Mistakes get caught mid-stroke.

An edit that does not parse is rejected and reverted the moment it is written, with the error handed straight back to the agent. And every iteration that resolves a blocker writes the lesson to memory — so the next goal starts smarter than the last.

Edit guardrail.

Unparseable code never lands. The bad write rolls back; the agent learns why.

A score that must move.

Numeric fitness criteria capture a baseline at pre-flight and demand measurable improvement — in the right direction, by the margin you set.

Memory writeback.

Resolved blockers become memory. Goals compound.

Human control stays live.

A running goal can be paused, resumed, cancelled, inspected, or replayed. Permission mode changes, approval requirements, exhausted budgets, missing credentials, and risky mutations stop the loop instead of silently pushing through.

Pause and resume.

Stop the loop without losing the objective or evidence.

Cancel.

End the goal with cancellation state and audit trail intact.

Approve.

Sensitive steps require the operator proof the policy demands.

Complete.

The goal closes only when completion criteria are satisfied.

Evidence, not vibes.

The TUI and Mac app show goal progress from the real backend ledger: provider calls, tool calls, command output, step state, budgets, cache events, memory writes, subagent results, and completion checks.

Studiomemory / plans / goals / replay

memorysketchescanvasesgoals

memoryauth module uses argon2id

plan3 files, 2 risks, 4 checks

goal62 percent complete

replay48 strokes recorded

Good goals.

Long refactors.

Move a shared API, update callers, and keep tests green across packages.

Backlog cleanup.

Work through a queue of scoped tasks, each verified before the next begins.

Feature completion.

Turn a product spec into staged implementation, tests, and final review.

Migrations.

Move frameworks, languages, or dependencies with repeatable checks at each stage.

Cross-provider.

The same goal contract can run through hosted, managed, BYO-key, subscription, or local routes. Anthropic, OpenAI, Microsoft Foundry, Google, xAI, Qwen, Xiaomi MiMo, Kimi, DeepSeek, Zhipu GLM, MiniMax, OpenRouter, Ollama, and vLLM stay behind the same harness.

model accessfour paths

Managedno setupsponsor-supported

Claude / ChatGPTsubscriptionpassthrough

Provider keysBYOdirect billing

LocalOllama / vLLMyour machine

Budgets and transparency.

Picasso tracks per-session, per-day, per-goal, sponsor, provider, alias, and route budgets, pauses when exhausted, and keeps the status line honest. Autonomy stays useful because the limit is visible.