hermes-agent

mirror of https://github.com/NousResearch/hermes-agent.git synced 2026-06-24 11:38:29 +00:00

Author	SHA1	Message	Date
Teknium	ed0e2ab371	chore(providers): remove dead cloudcode-pa quota-fallback branches The google-antigravity and google-gemini-cli OAuth providers were removed in #50492. They were the only producers of a cloudcode-pa:// base_url, so the account-level-quota early-returns in _pool_may_recover_from_rate_limit and _credential_pool_may_recover_rate_limit are now unreachable. - Drop the dead cloudcode-pa:// checks and the now-unused provider/base_url params on _pool_may_recover_from_rate_limit (only caller updated). - Prune the obsolete CloudCode-specific regression tests; keep the live single/multi-entry pool-rotation invariants (#11314).	2026-06-23 11:26:03 -07:00
Teknium	70d28b62fb	feat(cli): track background subagents in the status bar (#51441 ) The classic prompt_toolkit status bar already shows two background indicators: ▶ N (/background agent threads) and ⚙ N (shell processes spawned by terminal(background=true)). Background/async subagents (delegate_task batches and background single delegations) had no indicator despite being long-running work the user should be able to see at a glance. Add a third indicator ⛓ N sourced from tools.async_delegation.active_count() — the count of delegations still in the 'running' state. Renders in the plain-text builder and the styled-fragment builder across the same width tiers as the other two (omitted on the narrow <52 tier), guarded so a raising active_count() leaves the snapshot at 0.	2026-06-23 11:09:08 -07:00
Teknium	6cc07b6cd0	feat(discord): render reasoning as -# subtext via display.reasoning_style (#51168 ) Adds a per-platform display.reasoning_style setting (code \| blockquote \| subtext) controlling how the show_reasoning summary renders on the gateway. Discord defaults to "subtext" (-# small grey metadata text); every other platform keeps the fenced code block. Resolves through the existing display.platforms.<platform>.reasoning_style override chain.	2026-06-23 10:44:02 -07:00
xxxigm	f32be4439c	test(install): assert no system-browser auto-detect + snap override repair Replace the old "skips download when a system browser exists" assertions with tests for the new behavior: - no PATH scan for browser command names, and the "use the system browser" path is gone; - find_system_browser consults only an explicit AGENT_BROWSER_EXECUTABLE_PATH override (which still skips the bundled download); - strip_snap_browser_override runs on both install paths and a /snap/* path is rejected, so already-affected installs auto-recover on update.	2026-06-23 10:38:15 -07:00
ethernet	0089bd820f	fix(ci): classify should default to no MCP	2026-06-23 10:32:27 -07:00
ethernet	05c896cf52	ci: refactor paths & clones ci: centralize path-gating behind single orchestrator + all-checks-pass gate Replace the scattered per-workflow detect-changes pattern with a single ci.yml orchestrator that runs the classifier once, then conditionally calls sub-workflows via workflow_call based on lane outputs. A final all-checks-pass job (if: always()) aggregates all results so branch protection only needs to require one check. Changes: - New .github/workflows/ci.yml orchestrator (detect + conditional calls + all-checks-pass gate) - Extend classify_changes.py with scan/deps/mcp_catalog lanes, absorbing supply-chain-audit's internal changes job - Update detect-changes/action.yml to expose the new lane outputs - Convert all 10 PR-gated sub-workflows to workflow_call-only triggers, removing their push/pull_request triggers and per-step detect-changes guards (gating now happens at the orchestrator level) - lint.yml + supply-chain-audit.yml receive event_name as a workflow_call input to replace github.event_name (which is "workflow_call" inside called workflows) - supply-chain-audit.yml: remove internal changes job + *-gate jobs (orchestrator handles gating, booleans arrive as inputs) - contributor-check.yml: remove internal filter step - Update test_classify_changes.py for 6-lane output + new supply-chain test cases	2026-06-23 09:30:50 -07:00
Brooklyn Nicholson	45540cfb5e	ci: run only the lanes a PR affects (python/frontend/site) Heavy PR checks run on every PR because the workflows deliberately avoid `on.paths` filters — a path-gated workflow leaves its required check pending forever when no matching file changes, blocking merge. So a docs-only PR still spins up the TypeScript matrix, the full Python suite, and ruff/ty. Keep every workflow triggering on every PR (checks always report) but gate the expensive steps on what the PR touches. Skipping a step (not the job) leaves the job green, so required checks never hang — the same idiom already proven in contributor-check.yml. A classifier (scripts/ci/classify_changes.py) maps the PR diff to three lanes — python, frontend, site — surfaced as step outputs by a composite action (.github/actions/detect-changes). Fail-open: an empty diff or any .github/ change runs everything; python is a denylist (skipped only when every file is provably prose or a frontend-only package); skills/**/SKILL.md counts as python-relevant since the skill-doc tests read that tree. Non-PR events always run the full pipeline.	2026-06-23 09:30:50 -07:00
Ben	2196584161	fix(slack): transcribe in-app voice messages (audio/mp4) instead of failing Slack in-app voice clips ("record a clip") arrive as MP4/AAC containers (mimetype audio/mp4, filename audio_message.mp4), and Slack sometimes labels them video/mp4. The inbound audio handler derived the cache extension from the mimetype and fell back to ".ogg" for anything not in {.ogg,.mp3,.wav,.webm,.m4a} — so audio/mp4 voice messages were cached as .ogg. OpenAI STT (whisper-1, gpt-4o-transcribe) sniffs the container from the FILENAME extension, so it received MP4 bytes named .ogg and rejected them. WhatsApp .ogg and uploaded .m4a worked only because their extension happened to match the bytes. Fix: - _resolve_slack_audio_ext(): pick the cache extension from the real filename first, then a mimetype map (audio/mp4 -> .m4a), defaulting to .m4a — never the bogus .ogg fallback. Mirrors the video branch and the audio map already in gateway/platforms/bluebubbles.py. - _is_slack_voice_clip(): detect audio-only clips mislabeled video/mp4 via the slack_audio subtype / audio_message filename, and route them through the audio path (cached as audio, reported as audio/*) so they reach STT instead of video understanding. Genuine videos (and slack_video screen recordings) are left on the video path. Verified end-to-end against a real audio-only MP4: old path cached it as .ogg (ffprobe shows MP4 bytes -> container mismatch -> OpenAI rejects); new path caches it as .mp4 (extension matches bytes -> accepted). Adds inbound-audio tests (previously none): helper unit tests plus _handle_slack_message E2E coverage for audio/mp4, video/mp4-mislabeled voice clips, and a real video staying on the video path. Confirmed the two voice-message tests fail without the fix (mutation check).	2026-06-23 14:44:12 +05:30
Ben Barclay	45bc4fb37f	feat(relay): declare relevance policy to the connector + document the management plane (#51248 ) The gateway half of Phase 6 Unit ζ: project the agent's existing relevance knobs into the connector's platform-agnostic vocabulary and declare them at boot over the /relay/policy route, so the SAME mention-gating / free-response / allow-bots behavior the agent applies directly also governs relay delivery (and excluded chatter never wakes a scaled-to-zero agent). - gateway/relay/__init__.py: - relay_relevance_policy(): project require_mention -> requireAddress, free_response_channels -> freeResponseScopes, {PLATFORM}_ALLOW_BOTS in {mentions,all} -> allowOtherBots. Reads the fronted platform's config block + bridged top-level keys. Returns None when all-default (the connector's quiet default already matches) or no concrete platform is fronted. - send_relay_policy(): POST /relay/policy authenticated with the gateway's own per-gateway upgrade token (make_upgrade_token — same bearer as the WS upgrade), so the connector attaches it to the authenticated instance, never a body-asserted id. Re-declares every boot (self-healing, full replace). NEVER raises, NEVER blocks boot — relevance is an optimization layered on the δ/ε authorization gate. Reuses the per-gateway secret + the /relay/provision host; no new inbound surface, no new credential. - _policy_url(): ws(s)://…/relay -> http(s)://…/relay/policy. - gateway/run.py: call send_relay_policy() after register_relay_adapter() succeeds (the secret is resolved by then). - docs/relay-connector-contract.md: new §7 documenting per-instance delivery + the management plane (/manage/* + /relay/policy) + the relevance-declaration contract; versioning renumbered to §8. Contract conformance test stays green (§2/§3 tables untouched). Tests: +12 (projection mapping incl. comma-string + top-level fallback; send auth/skip/fail-soft/non-200). Full relay suite 118 pass. The connector route is already E2E-proven (connector repo gateway_policy_driver.py); this adds the real gateway send-path it pairs with. This completes Phase 6 (Team Gateway per-user isolation) end to end.	2026-06-23 18:43:19 +10:00
brooklyn!	211ba9c7d3	feat(agent): one-shot LLM helper + llm.oneshot gateway RPC (#51261 ) A "one-shot" is a single stateless model call that runs OUTSIDE any conversation: it never touches session history, never breaks prompt caching, and returns plain text. UI surfaces need this for small generative chores — a commit message from a diff, a rename suggestion, a summary — where an agent turn would pollute the thread and hand-rolling an LLM call at every call site would be worse. - `agent/oneshot.py`: `run_oneshot(...)` over the existing auxiliary-client plumbing (same path as title generation). Two call shapes: explicit instructions/input, or a registered `template` + `variables` (templates own the prompt engineering so it stays consistent across CLI/TUI/desktop). Ships a `commit_message` template. Model selection inherits the live session via `main_runtime`, else the configured aux `task` backend. - `tui_gateway/server.py`: `llm.oneshot` RPC (long-handler) inheriting the session's model when `session_id` resolves. Stateless by construction — no session mutation, cache untouched.	2026-06-23 08:01:50 +00:00
brooklyn!	af7b7f6322	feat(agent): expose coding-context project facts as structured data + project.facts RPC (#51259 ) Follow-up to the coding-context posture (#43316): that PR detects each repo's verify loop (manifests, package manager, exact test/lint/build commands, context files) and bakes it into the system-prompt snapshot — but only as a string, for the model. Non-prompt consumers (the desktop verify UI) had no way to read it without re-sniffing and drifting from the prompt. Split detection from rendering, keeping one source of truth: - `detect_project_facts(root) -> ProjectFacts` (frozen) holds the structured facts; `_project_facts()` now renders it into the same snapshot lines, so the prompt block stays byte-identical (cache-safe). - `project_facts_for(cwd)` resolves the workspace root (git, else marker) and returns the structured facts, or None outside a workspace. - `project.facts` gateway RPC surfaces it to any client (desktop/TUI/ACP). Tests assert the structured output and that the UI-facing commands never drift from what the prompt block renders (one detector feeds both).	2026-06-23 08:00:01 +00:00
Teknium	bb7ff7dc30	revert(cron): return cron job storage to per-profile (reverts #32117 + #50993 ) (#51116 ) * Revert "fix(cron): scope job execution to its owning profile (#32091 follow-up) (#50993)" This reverts commit `660e36f097`. * Revert "fix(cron): anchor cron storage at the default root home (not the active profile)" This reverts commit `a5c09fd176`.	2026-06-22 17:53:50 -07:00
Eri Barrett	ba9e3a491b	feat(memory): Honcho OAuth connect — desktop and CLI flows + token refresh (#44335 ) * feat(memory): OAuth token storage and refresh for the Honcho provider * feat(memory): refresh the Honcho OAuth token in the client and session * feat(memory): zero-CLI loopback OAuth authorization flow * feat(memory): generic memory-provider OAuth connect endpoints * feat(desktop): memory-provider OAuth connect link * feat(memory): CLI OAuth sign-in with source-tagged authorize links * fix(memory): IP-literal loopback redirect and consent config_path on the authorize link * fix(memory): profile-scope the memory-provider OAuth endpoints * refactor(desktop): generic memory-provider OAuth client functions * docs(memory): trim OAuth module docstrings to the invariants * docs(memory): document OAuth connect as an optional auth method * fix(memory): send home-relative display path to consent, not the absolute path * perf(memory): cache OAuth token expiry in memory to skip the hot-path disk read * fix(memory): log OAuth refresh failures at warning, not debug * feat(memory): fall back to an OS-assigned loopback port when 8765 is taken * test(memory): cover the desktop Connect launcher, status, and provider dispatch * fix(desktop): keep the memory-provider dropdown one size regardless of connect state * fix(desktop): move the memory connect link to the description line, leaving the dropdown untouched * refactor(memory): move OAuth connect routes out of web_server into a memory-layer router * refactor(desktop): import MemoryConnect directly, drop the single-export barrel * fix(memory): launch CLI OAuth sign-in right after the auth choice, not after the wizard * fix(desktop): auto-clear the OAuth error state instead of leaving it sticky * test(honcho): isolate auth-method prompt from deployment-shape wizard tests main's wizard suite scripts the cloud prompts without the OAuth auth-method step; auto-answer it in the shared helper so the answer lists stay shape-only. * docs(honcho): document query-adaptive reasoning level (reasoningHeuristic) README never mentioned reasoningHeuristic and listed reasoningLevelCap as an orphaned cap with the wrong default (— vs "high"). Add the query-adaptive scaling note + the reasoningHeuristic/reasoningLevelCap rows (grouped under Dialectic & Reasoning), matching the wording already on the hosted honcho.md page, and add a pointer from the memory-providers overview. * fix(honcho): default the CLI peer prompt to the OAuth consent name The CLI runs the grant with apply_config=False, so the peerName the user just entered at consent was dropped and the wizard's 'Your name' prompt fell back to $USER. Surface it as a transient OAuthCredential.consent_peer_name (set even when config isn't merged) and seed the prompt default from it. * feat(honcho): split OAuth client_id by surface (cli=hermes-agent, desktop=hermes-desktop) resolve_endpoints now picks the client_id from the initiating surface and threads it through authorize -> token exchange -> persisted grant -> refresh, so the CLI and desktop register as distinct OAuth clients. Surface-specific env overrides (HONCHO_OAUTH_CLIENT_ID_CLI/_DESKTOP) win over the generic HONCHO_OAUTH_CLIENT_ID, which still overrides every surface. * feat(honcho): show OAuth vs API key in status; detect existing OAuth in setup status now prints 'Auth: OAuth (clientId, token valid Xm/expired)' instead of masking the OAuth access token as a generic API key; setup notes an existing OAuth grant when re-run. * docs(honcho): drop 'shared pool' wording from unified observation mode help * fix(honcho): cross-process lock around OAuth refresh to prevent grant revocation The in-process threading lock can't stop a sibling process (another profile or the desktop app sharing honcho.json) from replaying the single-use refresh token and tripping reuse-detection, which revokes the whole grant. Guard the read-refresh-persist section with an OS file lock on <config>.lock so only one process rotates at a time; the others re-read the freshly-persisted token. Best-effort: platforms without flock degrade to in-process serialization. * refactor(honcho): one OAuth client (hermes-agent) for all surfaces Collapse the per-surface client_id split. CLI and desktop now use a single client_id (hermes-agent); consent branding/UI still adapt via the source query param. One grant identity means no clientId-vs-refresh-token desync that could get the grant revoked. HONCHO_OAUTH_CLIENT_ID still overrides for self-hosting. * fix(honcho): per-session resolves to session_id, never remapped by title Reorder resolve_session_name so stable identifiers win over labels: gateway per-chat key first, then the per-session session_id, then the cwd map / title. A (possibly auto-generated) title can no longer remap a live per-session conversation onto a second Honcho session mid-stream — fixes the desktop, which is per-conversation via session_id. Consequence: a gateway's per-chat key now also wins over a title (titles never remap a stable id).	2026-06-22 19:16:47 -05:00
Brooklyn Nicholson	833710d33e	Merge remote-tracking branch 'origin/main' into pr-50994 # Conflicts: # tools/computer_use/cua_backend.py	2026-06-22 18:48:07 -05:00
Brooklyn Nicholson	88e136448d	fix(agent): shrink anthropic-native image history Retry image-size rejections by rewriting Anthropic base64 image source blocks, not just OpenAI-style image_url parts.	2026-06-22 18:23:21 -05:00
Teknium	87c4a5ebb8	feat(background-review): aux-model selector for the self-improvement review (#49252 ) Adds auxiliary.background_review.{provider,model} (default auto = main chat model — unchanged). Set it to a different, cheaper model and the post-turn self-improvement review runs there for ~3-5x lower cost. Cache-aware by design: the main chat is warm in the prompt cache, so the default full-history replay on the main model is cheap cache reads — left exactly as-is. A different model can't reuse that cache (different key), so when (and only when) routed to a different model the fork replays a compact digest instead of the full transcript, minimising what it cold-writes on the aux model. Same model -> full replay; different model -> digest. Quality holds in benchmarks: memory capture identical, skill near-identical. Nothing changes unless you opt in by naming a different model. Co-authored-by: Hermes Agent <noreply@nousresearch.com>	2026-06-22 14:54:53 -07:00
Teknium	660e36f097	fix(cron): scope job execution to its owning profile (#32091 follow-up) (#50993 ) The #32091 fix moved every profile's cron jobs into one shared root store, but never wired the execution-scoping half it recommended: a job still ran under whichever profile's ticker picked it up, not its owning profile. So a job created under `hermes -p donna` could execute with the root profile's .env / config.yaml / credentials. - jobs.py: create_job auto-captures the active profile (explicit profile= override available) and stores it on the job; resolve_profile_home() maps a profile name to its HERMES_HOME; legacy jobs backfill to 'default'. - scheduler.py: run_job applies the job's profile via a scoped HERMES_HOME override (env var + in-process ContextVar) before any .env/config/script load, restored in finally. tick() routes profile-mismatched jobs to the single-worker sequential pool so the env mutation can't race. - cronjob tool threads profile through (NOT exposed in the model schema, to avoid cross-profile privilege escalation); hermes cron add gains --profile. E2E verified against a temp HERMES_HOME with a real profile dir: a root-profile ticker runs a profile='donna' job with HERMES_HOME=donna during execution and restores the ticker env afterward.	2026-06-22 14:54:28 -07:00
Tranquil-Flow	15880da8bb	fix(file_tools): resolve tilde using profile home for file operations (#48552 ) File tools (read_file, write_file, patch, list_directory, etc.) used os.path.expanduser() which reads the gateway process HOME env var. In Docker/systemd/s6 deployments where the gateway HOME differs from interactive sessions, tilde expanded to the wrong directory. Add _expand_tilde() helper that delegates to get_subprocess_home() when available, falling back to os.path.expanduser(). Replace all 9 expanduser() call sites in file_tools.py with _expand_tilde().	2026-06-23 03:17:47 +05:30
kshitijk4poor	c080b2dc3e	fix(gateway): redact credentials from TUI approval prompts (#48456 ) Follow-up to #50767, which redacted the chat-platform (_approval_notify_sync) and SSE/API (_approval_notify) approval transports. The TUI JSON-RPC transport is the third egress and was missed: three register_gateway_notify callbacks in tui_gateway/server.py emitted the raw approval_data — including the unredacted command Tirith flagged — straight to the TUI client via _emit. Route all three registrations through a new module-level _emit_approval_request() helper that redacts payload['command'] via the shared gateway.run._redact_approval_command seam before emitting, matching the pattern used for the other two transports. Completes the whole-bug-class fix for #48456. Tests: assert the helper emits a redacted command (real credential pattern), handles missing/None command, and a wiring guard that no registration emits the raw payload directly (only the helper may). Both mutation-checked. The #48456 fix series originated from @liuhao1024's #48462 — credit to them for the original report and chat-platform fix; this completes the remaining transport. Co-authored-by: liuhao1024 <sunsky.lau@gmail.com>	2026-06-23 03:14:18 +05:30
kshitijk4poor	0e69cd4b37	fix(memory): honor configured char limits in the no-agent on-disk store Follow-up to the /memory approve fresh-store fix. Both the CLI fallback and the messaging-gateway handler built a bare MemoryStore() with the hardcoded default char limits (2200/1375), ignoring the user's configured memory.memory_char_limit / user_char_limit. A live agent honors those overrides (agent/agent_init.py), so an approval applied without a live agent could accept a write the user's lower cap would reject, or vice versa. Extract a shared tools.memory_tool.load_on_disk_store() factory that reads the configured limits (falling back to defaults if config can't load) and wire both the CLI and gateway handlers to it, closing the gap on both surfaces and de-duplicating the construction block.	2026-06-23 03:10:53 +05:30
Max Hsu	3147cbb136	fix(memory): apply /memory approve against a fresh store when no live agent The CLI /memory slash handler (cli_commands_mixin._handle_memory_command) passed self.agent._memory_store straight through, which is None when the command runs without a live agent — e.g. /memory approve from the Desktop GUI. The shared write-approval handler then returns "memory store unavailable" and applies nothing, even with built-in memory enabled and pending writes present. Fall back to a freshly loaded on-disk MemoryStore when no live store is available, mirroring the gateway path (gateway/slash_commands.py). It persists to the same MEMORY/USER.md and creates MEMORY.md on the first approved write. Fixes #46783 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-23 03:10:53 +05:30
kshitijk4poor	100e7be20e	fix(security): deny root-level credential stores in media delivery The media-delivery denylist in gateway/platforms/base.py enumerated only .env/auth.json/credentials/config.yaml under HERMES_HOME, so other credential stores that live at the root fell through and could be auto-attached to chat replies. The reported case: the Google Workspace skill's google_token.json refreshes every turn, bumping its mtime to 'now', which kept passing the strict-mode recency window and re-sent the OAuth token on every reply. Extend the explicit per-file denylist to mirror the canonical credential set already enforced by the read/write guards in agent/file_safety.py: google_token.json, google_oauth_pending.json, auth/google_oauth.json, .anthropic_oauth.json, webhook_subscriptions.json, cache/bws_cache.json, auth.lock, and the pairing/ token directory. Targeted per-file additions (not a blanket ~/.hermes deny, which was declined in #32090/#34425 because it would block skills/, logs/, and ad-hoc agent-written deliverables). mcp-tokens/ (#37222) and state.db/kanban.db (#41071) are left to their sibling targeted PRs. Reported-by: xxxigm (#50912)	2026-06-23 02:56:48 +05:30
infinitycrew39	91c465f6e7	test(discord): add regression test for 100-command sync limit Add a test to verify that _safe_sync_slash_commands deletes obsolete commands before creating new ones. This ensures we never temporarily exceed Discord's 100-command limit during sync, which would trigger error 30032 and break all slash commands. This test guards against the regression where sync could fail even though the registration cap was properly enforced.	2026-06-22 13:58:33 -07:00
helix4u	ae7e857420	fix(cron): deliver max-iteration fallback reports	2026-06-22 13:57:59 -07:00
helix4u	3972701424	fix(agent): complete final text on last turn	2026-06-22 13:57:59 -07:00
Teknium	0f741cef28	fix(tests): update cua install tests for cross-platform support f-trycua's #50855 test file predated the cross-platform PR (#50552) and reintroduced two stale tests asserting Linux is unsupported (test__non_macos_, patching platform.system="Linux" and expecting a no-op/warn). Linux + Windows are supported now, so install proceeds on those platforms. Restore main's cross-platform-correct versions: test__on_unsupported_platform_ using FreeBSD as the genuinely unsupported case.	2026-06-22 13:41:03 -07:00
Francesco Bonacci	5f1d23cfb2	fix(computer-use): delete broken pre-install asset probe; trust the upstream installer `hermes computer-use install` refused to install on Linux, Windows, and macOS x86_64 because the pre-install asset probe was hitting the wrong GitHub endpoint AND duplicating tag-resolution logic the upstream installer already does correctly. `_check_cua_driver_asset_for_arch()` queried `https://api.github.com/repos/trycua/cua/releases/latest`. On trycua/cua: - cua-driver-rs releases (the binary the installer fetches) are marked prerelease on every cut. GitHub's `/releases/latest` explicitly skips prereleases. - The Python package releases (`cua-agent`, `cua-computer`, `cua-train`) are non-prerelease and end up as the "latest" instead. Live API check today: $ curl -sf https://api.github.com/repos/trycua/cua/releases/latest \ \| jq '{tag:.tag_name, asset_count: (.assets\|length)}' { "tag": "agent-v0.8.3", "asset_count": 0 } The probe sees zero assets, prints "Latest CUA release has no Linux x86_64 asset", and skips install on every Linux / Windows / macOS-x86_64 host — even though the cua-driver-rs-v0.6.0 release ships 19 binary assets covering all those platforms. Filtering `/releases?per_page=N` for the `cua-driver-rs-v` prefix fixes the bug, but it duplicates tag-resolution logic the upstream `_install-rust.sh` already does correctly via `CUA_DRIVER_RS_BAKED_VERSION` (auto-baked by CD on every release, with a `/releases?per_page=N` API fallback for dev checkouts). The right answer is to trust that contract instead of mirroring it in Python where it can drift. Two paths get the same outcome without the probe: 1. Fresh install: run `install.sh` directly. It has the baked release tag, fetches the right asset, and errors with a clear message on missing-arch downloads. No preflight needed. 2. Upgrade path*: `cua_driver_update_check()` (separately added) shells `cua-driver check-update --json` against the installed binary, which returns the canonical update answer from the same source the installer uses. - `hermes_cli/tools_config.py`: delete `_check_cua_driver_asset_for_arch` and its two call sites in `install_cua_driver`. Replace with an inline comment near the top of the module explaining the rationale. - `tests/hermes_cli/test_install_cua_driver.py`: drop the `TestCheckCuaDriverAssetForArch` block. Add `TestArchProbeRemoval` with three regressions: - `test_probe_function_is_gone` — asserts the deleted helpers stay deleted. - `test_fresh_install_does_not_call_github_api` — asserts the install path doesn't hit GitHub directly from Python anymore. - `test_upgrade_with_binary_does_not_call_github_api_directly` — same for the upgrade path. All 9 `test_install_cua_driver` tests pass. Reported by @teknium1 while testing on a headed Ubuntu host.	2026-06-22 13:41:03 -07:00
Austin Pickett	2a58fee1a1	fix(api): allow dashboard updates for git checkouts in containers (#51005 ) Salvages #50469 by @libre-7. _dashboard_local_update_managed_externally() previously blocked every containerized dashboard from the local update API, even when the running install was a bind-mounted git checkout that can be updated with hermes update. Allow the dashboard updater only for git installs inside containers, while keeping hosted /opt/data, docker, and pip installs managed externally. Pip remains blocked because its apply path mutates the running container filesystem and is not the self-managed checkout case. Adds regression coverage for docker, git, and pip install-method handling inside containers, and maps the contributor email for release attribution. Co-authored-by: libre-7 <libre-7@users.noreply.github.com>	2026-06-22 15:55:33 -04:00
Teknium	6681f28d5b	fix(telegram): disable DM topic mode when last binding is pruned Follow-up to #31501. When the send-fallback prune removes a chat's final telegram_dm_topic_bindings row, also flip telegram_dm_topic_mode.enabled to 0 in the same transaction. Without this, a user who turns topics off in the Telegram client (rather than via /topic off) leaves enabled=1 with zero lanes: _recover_telegram_topic_thread_id keeps treating the chat as topic-enabled and lobby messages keep hunting for bindings that no longer exist. Clearing the flag makes recovery fully stand down once the dead topics are gone. Adds 3 regression tests covering the last-binding clear, the multi-binding no-op, and the unmatched-prune no-op.	2026-06-22 12:29:05 -07:00
xxxigm	11246dbe21	tests: regression coverage for stale topic-binding prune (#31501 ) Thirteen tests across four layers: * ``SessionDB.delete_telegram_topic_binding`` — pin the new helper's contract: removes only the (chat_id, thread_id) row it was asked about, leaves siblings alone, returns 0 silently when the row never existed, and is a no-op on a pristine database whose topic-mode tables haven't been migrated yet. * ``TelegramAdapter._prune_stale_dm_topic_binding`` — the glue must drop the binding when ``self._session_store._db`` exposes the helper, swallow exceptions so a failed cleanup never breaks the user-facing send, and refuse to issue a DELETE for ``chat_id=None`` / ``thread_id=None`` so a bookkeeping miss can't accidentally null-match every row. * Source-level guards on ``TelegramAdapter.send`` and ``_send_message_with_thread_fallback`` — the prune call must sit beside the two existing "Thread X not found, retrying without message_thread_id" warnings, before the retry runs, so a future refactor can't silently drop the cleanup wire. * End-to-end semantic — once a topic is pruned, the ``GatewayRunner._recover_telegram_topic_thread_id`` walk steers future inbound messages to the surviving binding instead of the dead one. This is the exact behaviour change the bug report's reproduction asks for: no more landings in the wrong topic until the operator hand-edits ``state.db``. Refs #31501	2026-06-22 12:29:05 -07:00
Teknium	30e5d0092d	feat(computer-use): add whole-screen/desktop capture target capture(app='screen'\|'desktop') now resolves to the OS shell/desktop window (Windows Progman/WorkerW desktop or Shell_TrayWnd taskbar, macOS Finder/Dock) so 'show me my screen' and 'click the taskbar' work. Previously capture() only matched application windows, and the schema advertised 'or the whole screen' without any code path delivering it. cua-driver is window-oriented (no virtual-desktop or per-monitor MCP tool), so a single image still cannot span multiple monitors — the schema now states this and the no-desktop-window path returns a clear message instead of silently grabbing the frontmost app.	2026-06-22 12:21:58 -07:00
jeeves-assistant	5250335863	fix(computer-use): route CuaDriver vision capture via get_window_state cua-driver 0.6.x removed the standalone screenshot MCP tool, so capture(mode='vision') hit 'Unknown tool: screenshot' and returned a 0x0 image with no PNG while som/ax (which use get_window_state) still worked. Route vision through get_window_state(capture_mode='vision'). Salvaged from PR #50771; same fix submitted earlier as #39262 by @Tranquil-Flow.	2026-06-22 12:21:58 -07:00
Teknium	2ba1cfeb2e	feat(goals): completion contracts for /goal — evidence-based judging (#50501 ) Adds an optional structured completion contract to the standing-goal loop, adapted from OpenAI Codex's /goal guidance (a durable objective works best when it names what done means, how to prove it, what not to break, what's in scope, and when to stop). A contract has five optional fields — outcome, verification, constraints, boundaries, stop_when. When set, the continuation prompt tells the agent to target the verification surface and respect constraints, and the judge marks the goal done only when the verification criterion is met with concrete evidence (command result, file excerpt, test output) instead of a loose "looks done" claim. This tightens the most common /goal failure mode: premature completion / endless over-continuation on an underspecified goal. Two ways to set a contract, both backward compatible (bare /goal <text> behaves exactly as before): - /goal draft <objective> — expands plain text into a full contract via the goal_judge aux model (cache-safe side call), falls back to a free-form goal if the model is unavailable. - /goal <text> with inline 'field: value' lines (verify:, constraints:, boundaries:, stop when:, ...). Plain goals with an incidental colon are not mangled — only known field prefixes are pulled out. - /goal show prints the active contract. Contracts persist in SessionDB.state_meta alongside the goal (survive /resume), compose with /subgoal criteria, and old goal rows load unchanged. CLI + every gateway platform via the shared GoalManager engine; zero new model tools. Tests: +18 in tests/hermes_cli/test_goals.py (parse/serialize/judge-prompt/ draft/fallback), 73/73 green; 42/42 across the broader goal test surface; live E2E roundtrip (set -> persist -> reload -> contract-aware prompts) green.	2026-06-22 12:20:09 -07:00
Teknium	ff08e60c63	feat(skills): add cloudflare-temporary-deploy optional skill (#50849 ) * chore: re-trigger CI (workflows did not dispatch on prior head) * feat(skills): add cloudflare-temporary-deploy optional skill Optional web-development skill teaching the agent to deploy a Worker to a live workers.dev URL with no Cloudflare account via 'wrangler deploy --temporary' (Wrangler 4.102.0+). Cloudflare provisions a throwaway, claimable account valid for 60 minutes — ideal for an autonomous write->deploy->verify loop with no OAuth/signup hard stop. - SKILL.md: when/when-not, prereqs (unauth requirement, version floor), step-by-step deploy + verify flow, product limits table, pitfalls (hidden flag, stale global wrangler, auth-present error, rate limits, workers.dev edge cache), verification. - scripts/parse_deploy_output.py: stdlib-only parser extracting live URL, claim URL, account name/state, expiry, deploy status from wrangler output. - tests/skills/test_cloudflare_temporary_deploy_skill.py: 16 tests incl. a real-output regression case. Verified live end-to-end: temporary account created with no creds, deployed to a live URL, curl confirmed body, redeploy reused the account.	2026-06-22 12:14:30 -07:00
kshitij	5937b95192	Merge pull request #50773 from NousResearch/salvage/43719-dashboard-plugin-rce fix(security): restrict dashboard plugin backend auto-import to bundled plugins — defense-in-depth (#43719)	2026-06-22 22:57:33 +05:30
Teknium	f1e6d39a74	feat(computer_use): disable cua-driver telemetry by default, add opt-in (#50842 ) * feat(computer_use): disable cua-driver telemetry by default, add opt-in cua-driver ships anonymous PostHog usage telemetry ENABLED by default upstream (fires cua_driver_install / cua_driver_doctor events to eu.i.posthog.com). Hermes now disables it for our users unless they explicitly opt in. - New config key `computer_use.cua_telemetry` (default false) in DEFAULT_CONFIG. - `cua_backend.cua_driver_child_env()` injects `CUA_DRIVER_RS_TELEMETRY_ENABLED=0` into the child env when telemetry is disabled (the default); leaves the var untouched on opt-in so the driver uses its own default. Reads config fail-safe — any error defaults to telemetry off. - Routed every cua-driver spawn site through the policy: MCP backend (StdioServerParameters env), `cua_driver_update_check`, doctor's health_report Popen, the install.sh/install.ps1 runner, and the `--version` / status probes. - Docs: new Telemetry subsection in computer-use.md (EN). - Tests: tests/computer_use/test_cua_telemetry.py — default disables, explicit-false disables, opt-in leaves var untouched, config-failure fails safe, inherited-enabled is overridden off. Verified live on Linux against the real cua-driver-rs 0.6.0 binary: with the var=0 the driver reports "telemetry: disabled via CUA_DRIVER_RS_TELEMETRY_ENABLED" and sends no event; with it unset it logs "sending event: cua_driver_doctor". 213 computer_use + install tests green. * fix(dashboard): fold computer_use config category into agent tab The new computer_use.cua_telemetry key created a single-field dashboard config category, tripping test_no_single_field_categories (web_server's invariant that categories with <2 fields must be merged to avoid tab sprawl). Add computer_use -> agent to _CATEGORY_MERGE, matching the existing onboarding/telegram single-field folds.	2026-06-22 09:57:16 -07:00
iaji	441bd6d8db	fix(slack): split csv mention pattern fallback	2026-06-22 09:44:52 -07:00
devorun	4966268764	fix(slack): honor documented `mention_patterns` wake words The Slack docs document `slack.mention_patterns` as custom wake words that trigger the bot alongside `@mention`, and the config layer bridges the key into the Slack adapter's `config.extra` — but the adapter never read it. With `require_mention` on, a channel message containing a configured wake word (and no literal `<@BOTUID>`) was silently ignored. Every other adapter that documents `mention_patterns` (Telegram, DingTalk, Mattermost, WhatsApp, BlueBubbles, Photon) implements it; Slack was the odd one out. Add `_slack_mention_patterns()` (compiled, cached; reads `slack.mention_patterns` as a list/string or `SLACK_MENTION_PATTERNS` as a JSON/CSV/newline list, invalid regexes warned and skipped) and `_slack_message_matches_mention_patterns()`, mirroring the existing adapters. Channel mention detection now also triggers on a wake-word match, so the documented field works as described. Adds tests for pattern compilation (list/string/env/invalid-regex) and for the channel-trigger gating with a wake word under require_mention.	2026-06-22 09:44:52 -07:00
Teknium	b1b20270c4	refactor(memory): move write-mirror gating behind MemoryManager interface The success/staged gating and op-expansion for mirroring built-in memory writes to external providers lived in a standalone agent/memory_write_bridge.py helper called inline from two core call sites (tool_executor.py, agent_runtime_helpers.py). That left the mirror decision-making in the agent loop, outside the memory-provider interface. Fold it into a new MemoryManager.notify_memory_tool_write() entry point: the loop now hands over the raw tool result + args and a metadata callback, and the manager decides whether/what to mirror. Both core call sites collapse to a single call; the orphan module is removed. No MemoryProvider ABC change. Tests rewritten as behavior tests against the manager method.	2026-06-22 07:00:42 -07:00
Hao Zhe	027cb649ef	fix(memory): fail closed on unclear write results	2026-06-22 07:00:42 -07:00
Hao Zhe	c7e0501e9b	fix(openviking): drain memory mirror workers on shutdown	2026-06-22 07:00:42 -07:00
Hao Zhe	70e7132e2f	fix(openviking): gate memory writes and add viking_forget Mirror built-in memory writes to external providers only after the native memory tool succeeds and is not staged for approval. Keep OpenViking's built-in memory mirroring add-only, since Hermes native memory entries do not yet have stable OpenViking file URIs for replace/remove. Add a narrow viking_forget tool for exact user memory file deletion and document the current OpenViking write/delete behavior.	2026-06-22 07:00:42 -07:00
teknium1	38c56a1e86	fix(computer_use): probe cua-driver-rs release tag, not monorepo releases/latest The install pre-flight asset probe queried trycua/cua's `releases/latest`, which floats across the monorepo's components (agent-, computer-, lume-, train-) — most ship zero binary assets. So the probe false-negatived and hard-blocked `install_cua_driver` (line 770: `if not probe: return False`) BEFORE the upstream installer ran, on Linux, Windows, and Intel macOS — even though the installer it gates resolves the right tag and would have succeeded. Net effect: the normal enable path (`hermes tools` → Computer Use post-setup, and `hermes computer-use install`) refused to install on every platform this PR claims to support. Fix: list `/releases?per_page=100`, pick the newest `cua-driver-rs-v` tag, and match its assets on OS-token + arch — mirroring what the upstream `install.sh` already does. Fail open if no driver release surfaces (installer remains the source of truth). Adds an OS-token gate so a darwin asset can't satisfy a Linux probe. Tests: updated the install-probe fixtures to the list-of-releases shape with `cua-driver-rs-v` tags + OS-token asset names; added a regression guard (`test_releases_latest_tag_ignored_picks_driver_rs_tag`) for the monorepo floating-latest case. 25/25 install + 192 computer_use tests green. Verified live: probe returns True for all six platform/arch combos against the real GitHub releases API.	2026-06-22 06:42:30 -07:00
Francesco Bonacci	f2e37549c6	feat(computer_use): cross-platform cua-driver (macOS/Windows/Linux) Make the computer_use toolset platform-agnostic by driving cua-driver on macOS, Windows, and Linux. Consumes the 8 cua-driver decoupling surfaces (capability discovery, structuredContent AX tree, opaque element_token, click button enum, explicit mimeType, machine-readable manifest, structured list_windows, structured health_report), each degrading gracefully on older drivers. Adds `hermes computer-use doctor` (drives cua-driver health_report with a per-OS check matrix and an exit 0/1/2 ok/degraded/blocked contract), full typed wrappers for the previously-uncovered cua-driver tools plus a generic call_tool escape hatch, per-session agent-cursor lifecycle, platform-aware system-prompt guidance (host-deterministic, cache-safe), and honors HERMES_CUA_DRIVER_CMD end-to-end. Replaces the macOS-only skills/apple/macos-computer-use skill with a cross-platform skills/computer-use skill, and refreshes the EN + zh-Hans docs. Supersedes #44221 (Windows-enablement salvage of #30660). Co-authored-by: Teknium <127238744+teknium1@users.noreply.github.com>	2026-06-22 06:42:30 -07:00
Teknium	ff85af3fc7	feat(goals): /goal wait <pid> — park the loop on a background process (#50503 ) * feat(goals): add /goal wait <pid> barrier to park the loop on a background process The /goal loop re-pokes the agent every turn via the post-turn judge. When a goal is gated on a long-running background process (CI poller, build, test matrix, deploy) that produces nothing to judge yet, this spins the agent into 'is it done?' busy-work and burns the turn budget. /goal wait <pid> [reason] parks the loop: while the PID is alive, the judge is skipped, no turn is consumed, no continuation fires, and /goal status shows a parked indicator. The barrier auto-clears the moment the process exits (the agent's notify_on_complete watcher is the natural wake signal), then the next turn resumes normal judging. /goal unwait clears it manually; pause/resume/clear drop it; a dead/stale PID can never wedge the loop. Wired across CLI, gateway, and the mid-run command guard for parity. Barrier persists in SessionDB.state_meta (survives /resume); GoalState gains backward-compatible waiting_on_pid/waiting_reason/waiting_since fields. 12 new tests; docs updated. * fix(goals): use gateway.status._pid_exists for liveness, not os.kill(pid,0) The Windows-footguns CI guard flagged os.kill(pid, 0) in _pid_alive — on Windows that's not a no-op, it routes to CTRL_C_EVENT and hard-kills the target's console process group (bpo-14484). Delegate to the canonical footgun-safe gateway.status._pid_exists (psutil + ctypes/POSIX fallback) instead, with a direct-psutil last resort. * feat(goals): judge-driven auto-wait — the loop parks itself, no manual /goal wait Makes the wait barrier automatic. Every turn the judge is shown the agent's live background processes (pid, command, uptime, output tail from the process_registry) alongside the goal + response, and can return a new 'wait' verdict instead of continue: {"verdict":"wait","wait_on_pid":N} → park until that process exits {"verdict":"wait","wait_for_seconds":N} → park until the deadline passes evaluate_after_turn acts on the directive (sets the barrier, parks the loop) so the agent isn't re-poked into busy-work while CI/builds/deploys run. Adds a time-based waiting_until barrier alongside the pid barrier; both auto-clear and can never wedge the loop. Drivers (CLI, gateway, tui_gateway) feed the live registry in via gather_background_processes(). Manual /goal wait stays as an override. Judge verdict contract widened to (verdict, reason, parse_failed, wait_directive); legacy {"done":bool} shape still accepted. * test(goals): update kanban _fake_judge to the 4-tuple judge contract CI test(3) caught it: test_kanban_goal_mode's _fake_judge still returned the 3-tuple (verdict, reason, parse_failed), but the kanban loop now unpacks the 4-tuple (+ wait_directive). Update the fake to return None for the directive and accept the background_processes kwarg. * feat(goals): trigger-based wait — park on a process's own signal, not just exit Addresses two gaps in the judge-driven wait: (1) the judge could only express 'wait until PID exits' or 'wait N seconds', so a long-lived watcher/server that fires a trigger MID-RUN (and may never exit) couldn't be waited on; (2) the process's own watch_patterns/notify_on_complete trigger was invisible to the judge. Adds a session-based barrier (waiting_on_session) that releases on the process's OWN trigger via process_registry.is_session_waiting(): the session exits, OR (if started with watch_patterns) its pattern matches — even while the process keeps running. list_sessions() now surfaces session_id + watch_patterns/watch_hit/ notify_on_complete so the judge sees the trigger and is told to prefer wait_on_session for trigger processes. Judge verdict gains a {wait_on_session} directive (preferred over pid). Backward-compatible GoalState field; pid + time barriers unchanged. Tests: TestSessionTriggerBarrier (release on mid-run pattern match while alive, release on exit, unknown-session, full park→trigger→resume, parse, validation, backcompat load). 105 goal-surface + 85 process_registry tests green.	2026-06-22 06:27:29 -07:00
teknium1	a6ce9b2fbb	fix(picker): keep flat-namespace reseller first-party models in desktop picker OpenCode Go (and OpenCode Zen) showed only a subset of the models they serve in the desktop/CLI model picker — e.g. opencode-go rendered 13 of 19, silently dropping minimax-m3/m2.7/m2.5, glm-5/5.1, deepseek-v4-flash. Root cause: the picker dedup in build_models_payload strips any model from an aggregator row that overlaps a user-defined provider's catalog (so a local proxy isn't shadowed by OpenRouter). It gated on is_aggregator(), which is True for opencode-go/zen because their flat /v1/models returns bare IDs the model-switch resolver searches. But those are flat-namespace RESELLERS, not routing aggregators — every model they list is first-party, so deduping them against a user proxy that happens to serve a same-named model guts their own catalog. Fix: add is_routing_aggregator() (True only for true routers like OpenRouter and custom:* proxies; False for opencode-go/zen) and gate the picker dedup on it. is_aggregator() is unchanged so model-switch flat catalog resolution keeps working. Both desktop entry points (model.options JSON-RPC and /api/model/options REST) and hermes model share build_models_payload, so all surfaces get the full list. Fixes #47077	2026-06-22 06:09:08 -07:00
Teknium	ef6492b648	fix(gateway): cold-start installed Windows gateway after update when none was running (#50804 ) The post-update gateway resume path (`_resume_windows_gateways_after_update`) only relaunched gateways that were running when the update began — it enumerates live PIDs in `_pause_windows_gateways_for_update` and respawns exactly those. A gateway that had already died between updates (e.g. it was launched attached to a terminal/TUI that later closed, taking the child with it) was never brought back: the Startup-folder / Scheduled-Task autostart entry only fires on the next login, not after an in-place update. So a Desktop-GUI update (which runs `hermes update --yes --gateway`) on a box whose gateway had quietly died would complete with no gateway running, and the user had no indication anything should have come up. Fix: when no gateway is running at pause time but an autostart entry is installed (`gateway_windows.is_installed()` — an explicit "I want a gateway" signal), return a `cold_start_if_installed` token. The resume step then does a fresh detached spawn via `gateway_windows._spawn_detached()` — the same windowless `pythonw` + `CREATE_BREAKAWAY_FROM_JOB` path `hermes gateway start` uses. It re-checks liveness immediately before spawning so a concurrent start (autostart entry firing) can't produce a duplicate. Gateway-less users (no autostart entry) get nothing forced on them — the pause step still returns None for them. POSIX is unaffected: enabled systemd units already restart via `Restart=always`. Windows-only; best-effort throughout (logs at debug and no-ops on any error). Tests: pause returns the cold-start token only when installed, returns None when not installed, resume cold-starts on the token, and resume skips the cold-start when a gateway is already running.	2026-06-22 06:02:31 -07:00
teknium	e9cd8c5bf3	fix(delivery): drop env-var knob, flag all chunking adapters Follow-up to ScotterMonk's cron-truncation fix: - Remove HERMES_DELIVERY_MAX_PLATFORM_OUTPUT env var. Behavioral config belongs in config.yaml, not a new HERMES_* env var (.env is secrets only). The actual bug is fixed entirely by the adapter-aware skip; the configurable cap was unneeded scope. MAX_PLATFORM_OUTPUT is a constant again, collapsing the max_output=0 disable branch and the audit-vs-truncation threshold divergence. - Flag the remaining verified-chunking adapters (slack, matrix, feishu, mattermost, teams, whatsapp, whatsapp_cloud, weixin, bluebubbles, yuanbao) with splits_long_messages=True so the fix covers the whole bug class, not just Discord/Telegram. Each verified to chunk in its own send() via truncate_message(). - SMS deliberately left False: it chunks for normal replies but a multi-segment cron blast is cost-bearing; the 4000-cap + file save is the safer default there. - Update tests: drop the two env-override tests, add a test asserting a save failure during truncation (non-chunking) propagates.	2026-06-22 05:41:22 -07:00
ScotterMonk	86e4521cb1	fix(delivery): make cron output truncation configurable + adapter-aware Gateway-level truncation (MAX_PLATFORM_OUTPUT=4000) was pre-empting adapter-side message splitting. Discord and Telegram both chunk long content natively in their send() via truncate_message(), but the delivery router truncated to 3800 chars + footer before the adapter ever saw the full payload — so long cron output was cut short instead of being delivered as multiple messages (issue #50126). Changes: - HERMES_DELIVERY_MAX_PLATFORM_OUTPUT env var makes the cap configurable (default 4000, backward compatible). Set to 0 to disable truncation. - TRUNCATED_VISIBLE (3800) removed — visible portion now derived dynamically from max_output minus the actual footer length. - New BasePlatformAdapter.splits_long_messages capability flag (default False). Adapters that chunk in send() set True; delivery skips truncation for them but still saves full output to disk as audit. - Flagged Discord and Telegram (both verified to chunk in send()). Fixes #50126	2026-06-22 05:41:22 -07:00
Teknium	eecb5b9dd1	fix(update): don't count across shallow-clone boundary (bogus '12492 commits behind') (#50784 ) * chore: re-trigger CI (workflows did not dispatch on prior head) * fix(update): don't count across shallow-clone boundary (bogus '12492 commits behind') Installer checkouts are shallow (git clone --depth 1). The CLI banner and hermes update --check both did a plain git fetch (silently unshallowing the repo) then git rev-list --count HEAD..origin/main, which counts across the shallow boundary and prints a huge nonsense number like '12492 commits behind'. Detect shallow up front, fetch with --depth 1 to preserve the boundary, and compare tip SHAs instead of counting: - banner _check_via_local_git: returns UPDATE_AVAILABLE_NO_COUNT when behind (renders as 'update available') instead of the bogus count. - _cmd_update_check: reports presence-only on shallow clones. Full clones keep the exact count path unchanged. Mirrors the desktop fix in apps/desktop/electron/main.cjs (commit `2950c6fa2`).	2026-06-22 05:39:11 -07:00

1 2 3 4 5 ...

6040 Commits