Skip to content

echo112400-beep/osgrep

 
 

Repository files navigation

osgrep

Semantic search for your codebase.

License: Apache 2.0

Natural-language search that works like grep. Fast, local, and works with coding agents.

  • Semantic: Finds concepts ("auth logic"), not just strings.
  • Local & Private: 100% local embeddings via transformers.js.
  • Auto-Isolated: Each repository gets its own index automatically.
  • Adaptive: Runs fast on desktops, throttles down on laptops to prevent overheating.
  • Agent-Ready: Native integration with Claude Code.

Quick Start

  1. Install

    npm install -g osgrep    # or pnpm / bun
  2. Setup (Recommended)

    osgrep setup

    Downloads embedding models (~150MB) upfront. If you skip this, models download automatically on first use.

  3. Search

    cd my-repo
    osgrep "where do we handle authentication?"

    Your first search will automatically index the repository. Each repository is automatically isolated with its own index. Switching between repos "just works" — no manual configuration needed. If the background server is running (osgrep serve), search goes through the hot daemon; otherwise it falls back to on-demand indexing.

Coding Agent Integration

Claude Code

  1. Run osgrep install-claude-code
  2. Open Claude Code (claude) and ask it questions about your codebase.
  3. The plugin’s hooks auto-start osgrep serve in the background and shut it down on session end. Claude will use osgrep --json via Bash for semantic searches automatically.

Commands

osgrep search

The default command. Searches the current directory using semantic meaning.

The CLI prefers the hot server when available (via .osgrep/server.json), falling back to standalone search automatically.

osgrep "how is the database connection pooled?"

Options:

Flag Description Default
-m <n> Max total results to return. 25
--per-file <n> Max matches to show per file. 1
-c, --content Show full chunk content instead of snippets. false
--scores Show relevance scores (0.0-1.0). false
--compact Show file paths only (like grep -l). false
-s, --sync Force re-index changed files before searching. false
--json Dense output for agents. false

Examples:

# General concept search
osgrep "API rate limiting logic"

# Deep dive (show more matches per file)
osgrep "error handling" --per-file 5

# Just give me the files
osgrep "user validation" --compact

osgrep index

Manually indexes the repository. Useful if you want to pre-warm the cache or if you've made massive changes outside of the editor.

  • Respects .gitignore and .osgrepignore.
  • Smart Indexing: Only embeds code and config files. Skips binaries, lockfiles, and minified assets.
  • Adaptive Throttling: Monitors your RAM and CPU usage. If your system gets hot, indexing slows down automatically.
osgrep index              # Index current dir
osgrep index --dry-run    # See what would be indexed

osgrep serve

Runs a lightweight HTTP server with live file watching so searches stay hot in RAM.

  • Keeps LanceDB and the embedding worker resident for <50ms responses.
  • Watches the repo (via chokidar) and incrementally re-indexes on change.
  • Health endpoint: GET /health
  • Search endpoint: POST /search with { query, limit, path, rerank }
  • Writes lock: .osgrep/server.json with port/pid

Usage:

osgrep serve             # defaults to port 4444
OSGREP_PORT=5555 osgrep serve

Claude Code hooks start/stop this automatically; you rarely need to run it manually.

osgrep list

Lists all indexed repositories (stores) and their metadata.

osgrep list

Shows store names, sizes, and last modified times. Useful for seeing what's indexed and cleaning up old stores.

osgrep doctor

Checks installation health, model paths, and database integrity.

osgrep doctor

Performance & Architecture

osgrep is designed to be a "good citizen" on your machine:

  1. The Thermostat: Indexing adjusts concurrency in real-time based on memory pressure and CPU speed. It won't freeze your laptop.
  2. Smart Chunking: Uses tree-sitter to split code by function/class boundaries, ensuring embeddings capture complete logical blocks.
  3. Deduplication: Identical code blocks (boilerplate, license headers) are embedded once and cached, saving space and time.
  4. Hybrid Search: Uses Reciprocal Rank Fusion (RRF) to combine Vector Search (semantic) with FTS (keyword) for best-of-both-worlds accuracy.

Configuration

Automatic Repository Isolation

osgrep automatically creates a unique index for each repository based on:

  1. Git Remote URL (e.g., github.com/facebook/reactfacebook-react)
  2. Git Repo without Remote → directory name + hash (e.g., utils-7f8a2b3c)
  3. Non-Git Directory → directory name + hash for collision safety

Examples:

cd ~/work/myproject        # Auto-detected: owner-myproject
osgrep "API handlers"

cd ~/personal/utils        # Auto-detected: utils-abc12345
osgrep "helper functions"

Stores are isolated automatically — no manual --store flags needed!

Manual Store Management

  • View all stores: osgrep list
  • Override auto-detection: osgrep --store custom-name "query"
  • Clean up old stores: rm -rf ~/.osgrep/data/store-name
  • Data location: ~/.osgrep/data
  • Env Vars:
    • MXBAI_STORE: Override default store name
    • OSGREP_PROFILE=1: Enable performance profiling logs

Development

pnpm install
pnpm build        # or pnpm dev
pnpm format       # biome check

Troubleshooting

  • Index feels stale? Run osgrep index to refresh.
  • Weird results? Run osgrep doctor to verify models.
  • Need a fresh start? Delete ~/.osgrep/data and re-index.

License

Licensed under the Apache License, Version 2.0.
See Apache-2.0 for details.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • TypeScript 82.4%
  • JavaScript 17.0%
  • Shell 0.6%