Layer 1 · Mirror

Usage & configuration

Every mirror command, configuration, branch safety, and scheduling, the reference for the Mirror layer.

Every command, plus configuration, branch safety, and scheduling. New here? Start with QUICKSTART. For the knowledge layer see knowledge-layer.md.

Command reference#

contextlake sync runs the whole mirror pipeline end to end; each stage is also available as its own command:

The contextlake sync pipeline: fetch, then clone, then update, then branches, then verify, then audit.

status - Check Current Synchronization Status#

Shows the current state of your workspace compared to GitLab.

contextlake status

Example output:

GitLab projects (cached): 128      # repos you can see on GitLab
Local repositories:       128      # repos cloned in your workspace
Synchronized:             127      # present in both and matching
Missing:                  1        # on GitLab but not cloned yet
Extra:                    1        # cloned locally but not on GitLab

A fully synced workspace shows 0 for both.

fetch - Fetch All GitLab Projects#

Retrieves all repositories from the specified GitLab group and caches them locally.

contextlake fetch

This command:

clone - Clone Missing Repositories#

Clones any repositories that exist in GitLab but are missing locally.

contextlake clone

This command:

update - Update Existing Repositories#

Fetches and pulls the latest changes for all local repositories.

contextlake update

This command:

branches - Switch to Most Active Branches#

Analyzes all repositories and switches them to their most active development branch.

contextlake branches

This command:

Branch Selection Criteria:

verify - Verify Repository Structure#

Checks that the local workspace structure matches GitLab exactly.

contextlake verify

This command:

sync - Full Synchronization#

Runs the complete synchronization pipeline in sequence.

contextlake sync

This command executes:

  1. fetch - Get latest GitLab project list
  2. clone - Clone missing repositories
  3. update - Update existing repositories
  4. branches - Switch to active branches
  5. verify - Verify structure
  6. audit - Report repo health & age (skip with --no-audit)

audit - Repo health & age report#

Scans every local clone and reports which repos are effectively empty and how old/active they are. Runs automatically at the end of sync/bootstrap, or on demand:

contextlake audit                       # summary to console + report to <cache_dir>/repo_audit.json
contextlake audit --report ./audit.json # choose where the per-repo JSON + .csv are written
contextlake sync --no-audit             # run sync without the audit step

It classifies each repo as empty (no commits/files), readme-only (just a template README), boilerplate (only meta files), or content, and reports each repo's creation date (GitLab created_at, captured during fetch; falls back to the first git commit) and last commit date (from the local clone), with an aggregate summary (counts, oldest/newest, how many stale over 1–2 years, repos with no commits). The full per-repo table is written as JSON and CSV. The scan is parallel, read-only, and works offline from the fetch cache.

Configuration#

Using Configuration Files#

The tool supports configuration files for persistent settings. Configuration is loaded in the following precedence order:

  1. Local config: .contextlake.ini in the current directory (highest priority)
  2. Global config: ~/.contextlake.ini in the home directory
  3. Default values: Built-in defaults (lowest priority)
  4. CLI arguments: Override all other settings

Upgrading from gitlab-sync? Your existing ~/.gitlab_sync.ini / .gitlab_sync.ini (with its [gitlab_sync] section) is still read, and the knowledge store at ~/.gitlab-sync/ is reused as-is, nothing to migrate. New setups use .contextlake.ini and ~/.contextlake/; the gitlab-sync command also still works as a deprecated alias.

Example configuration file (.contextlake.ini):

[contextlake]
work_dir = ~/work
gitlab_group = your-gitlab-group
cache_dir = /tmp
clone_timeout = 300
fetch_timeout = 60
branch_timeout = 30
pull_timeout = 60
max_workers = 8

Custom Work Directory#

# Using config file (recommended)
# Edit .contextlake.ini and set work_dir

# Or override with CLI argument
contextlake --work-dir /path/to/workspace sync

Custom GitLab Group#

# Using config file (recommended)
# Edit .contextlake.ini and set gitlab_group

# Or override with CLI argument
contextlake --group my-gitlab-group sync

Combined Options#

contextlake --work-dir /home/user/dev --group your-gitlab-group status

Custom Config File#

contextlake --config /path/to/custom.ini sync

Settings reference#

Setting Description Default Example
work_dir Working directory for repositories ~/work /home/user/projects
gitlab_group GitLab group to synchronize your-gitlab-group mycompany-group
cache_dir Directory for cache files /tmp ~/.cache/contextlake
cache_file Name of projects cache file gitlab_projects.txt projects.txt
cache_json Name of JSON cache file gitlab_projects.json projects.json
clone_timeout Clone operation timeout (seconds) 300 600
fetch_timeout Fetch operation timeout (seconds) 60 120
branch_timeout Branch operation timeout (seconds) 30 60
pull_timeout Pull operation timeout (seconds) 60 120
max_workers Maximum parallel workers 8 4
clean_corrupted Auto-remove corrupted directories true false
max_retries Maximum retry attempts for failed operations 3 5
backoff_initial Initial backoff time in seconds 1 2
backoff_max Maximum backoff time in seconds 30 60
adaptive_workers Enable adaptive worker pool true false
min_workers Minimum workers for adaptive pool 2 4
error_threshold Error rate threshold for adaptive workers 0.5 0.3

The branch-safety settings (require_clean_workspace, protect_working_branches, safe_branches, auto_stash) live in their own section below, see Branch safety.

Branch Safety#

The tool protects your local work without getting in your way. The guiding rule: a clean repo is always safe to act on, the branch name alone never causes a skip. The only thing that blocks an update is a dirty working tree.

Branch-safety decision: a dirty working tree is skipped (or stashed if auto_stash); branches stays off a non-safe branch when protect_working_branches is set; otherwise contextlake acts, update pulls and branches switches.

Safety Checks#

  1. Clean Workspace Check (the main guard): detects a dirty working tree, uncommitted, unstaged, or untracked changes. A dirty repo is skipped by both update and branches so local work is never clobbered.
  2. Automatic Stashing: optionally stashes a dirty tree so update can proceed instead of skipping.
  3. Working-Branch Protection (applies to branches only): keeps the branches command from switching a repo off a branch outside safe_branches, so you are never moved off a feature branch you are working on. This does not affect update, a clean feature branch is still pulled.

Configuration#

Setting Description Default
require_clean_workspace Skip repos with a dirty working tree (the main guard) true
protect_working_branches Keep branches from switching a repo off a non-safe branch true
safe_branches Branches the branches command may switch away from main,master,develop,development
auto_stash Stash a dirty tree before update instead of skipping false

Behavior#

update (fetch + fast-forward the current branch):

branches (switch to the most active branch):

Example Scenarios#

Scenario 1: Working-Branch Protection (branches command)#

# Repository is on feature/my-feature branch (not in safe branches)
contextlake branches

# Output:
# [2026-06-16 10:00:00] ⊘ backend/services/api-gateway: Skipped branch switch (on working branch: feature/my-feature)

A plain contextlake update would instead pull feature/my-feature here, since the working tree is clean.

Scenario 2: Uncommitted Changes#

# Repository has uncommitted changes
contextlake update

# Output:
# [2026-06-16 10:00:00] ⊘ backend/services/api-gateway: Skipped (unsafe: Uncommitted changes detected)

Scenario 3: Auto-Stash Enabled#

# Repository has uncommitted changes, auto-stash enabled
contextlake --auto-stash update

# Output:
# [2026-06-16 10:00:00] ⚠ backend/services/api-gateway: Changes stashed successfully
# [2026-06-16 10:00:00] ✓ backend/services/api-gateway: Updated main

Customization#

You can customize branch safety behavior via configuration or CLI:

# In .contextlake.ini
[contextlake]
protect_working_branches = true
safe_branches = main,master,develop,staging
require_clean_workspace = true
auto_stash = false
# Or via CLI
contextlake --safe-branches main,master,develop,staging --auto-stash update

Disabling Safety Checks#

If you want to disable safety checks (not recommended for production workflows):

# Disable all safety checks
contextlake --no-protect-working-branches --no-require-clean-workspace update

Warning: Disabling safety checks can lead to conflicts, lost work, or corruption of your local branches. Only disable if you understand the risks.

Scheduling & automation#

Prerequisites for Cron Jobs#

Before setting up cron jobs, ensure you have:

  1. Configuration file set up: Create ~/.contextlake.ini with your settings

bash cp .contextlake.ini ~/.contextlake.ini # Edit with your work_dir and gitlab_group nano ~/.contextlake.ini

  1. Absolute path to script: Cron requires absolute paths

bash which python3 # Note the path # Example: /usr/bin/python3

  1. Test the command manually first:

bash cd /home/user/work && contextlake sync

Basic Daily Sync#

Run a full synchronization daily at 2 AM:

# Edit crontab
crontab -e

# Add the following line (replace paths as needed)
0 2 * * * cd /home/user/work && /usr/bin/contextlake sync >> /tmp/contextlake.log 2>&1

Note: This uses the configuration from ~/.contextlake.ini. No need to specify work_dir or gitlab_group in the cron command.

Hourly Updates (No Branch Switching)#

Update repositories hourly without changing branches (for CI/CD environments):

0 * * * * cd /home/user/work && /usr/bin/contextlake update >> /tmp/gitlab_hourly.log 2>&1

Weekly Full Sync with Branch Management#

Run full sync including branch switching weekly on Sunday at 3 AM:

0 3 * * 0 cd /home/user/work && /usr/bin/contextlake sync >> /tmp/gitlab_weekly.log 2>&1

Multiple Workspaces#

For multiple workspaces, use separate config files:

# Create workspace-specific config files
cat > ~/.contextlake_primary.ini << EOF
[contextlake]
work_dir = ~/work
gitlab_group = example-group-primary
EOF

cat > ~/.contextlake_secondary.ini << EOF
[contextlake]
work_dir = ~/Projects/Secondary
gitlab_group = example-group-secondary
EOF

# Add to crontab

# Sync primary workspace daily
0 2 * * * cd /home/user/work && /usr/bin/contextlake --config ~/.contextlake_primary.ini sync >> /tmp/gitlab_primary.log 2>&1

# Sync secondary workspace every 6 hours
0 */6 * * * cd /home/user/work && /usr/bin/contextlake --config ~/.contextlake_secondary.ini update >> /tmp/gitlab_secondary.log 2>&1

Monitoring and Alerts#

Add email notifications for failures:

# Create a wrapper script
cat > /home/user/scripts/contextlake_wrapper.sh << 'EOF'
#!/bin/bash
cd /home/user/work
contextlake sync >> /tmp/contextlake.log 2>&1
EXIT_CODE=$?

if [ $EXIT_CODE -ne 0 ]; then
    echo "GitLab sync failed with exit code $EXIT_CODE" | mail -s "GitLab Sync Failure" [email protected]
fi
EOF
chmod +x /home/user/scripts/contextlake_wrapper.sh

# Add to crontab
0 2 * * * /home/user/scripts/contextlake_wrapper.sh

Log Rotation#

To prevent log files from growing indefinitely, set up log rotation:

# Create logrotate configuration
sudo cat > /etc/logrotate.d/contextlake << 'EOF'
/tmp/contextlake.log {
    daily
    rotate 7
    compress
    delaycompress
    missingok
    notifempty
    create 0644 user user
}
EOF

Troubleshooting#

Symptom What to do
"Cache file not found" Run contextlake fetch first to populate the projects cache.
"Permission denied" during cloning Make sure glab is authenticated (glab auth login) and you can reach the repositories.
"Timeout" errors Raise the relevant *_timeout settings, check connectivity, or lower max_workers (set it to 1 to run serially). Behind a TLS-inspecting proxy, set GITLAB_TOKEN so enumeration uses the built-in HTTP client.
"Detached HEAD" states Handled automatically, the repo is skipped for pulls rather than failing.
Nested .git directories A repo cloned into a subfolder of itself. contextlake verify flags it; fix by moving the inner tree up one level and removing the empty folder.
Cron job not running Check crontab -l, use absolute paths, and test the exact command in a shell first; inspect cron logs (grep CRON /var/log/syslog). See Scheduling & automation.
Large log files Set up log rotation, see Scheduling & automation.

Best Practices#

  1. Initial Setup: Run contextlake sync once to set up full workspace
  2. Regular Updates: Use contextlake update for frequent, fast updates
  3. Branch Management: Run contextlake branches periodically to stay on active branches
  4. Monitoring: Check logs regularly for errors or failures
  5. Backup: Commit workspace state to git before major branch switches
  6. Testing: Test cron commands manually before adding to crontab
  7. Documentation: Keep this documentation updated with any custom configurations

Next steps