Usage & configuration
Every mirror command, configuration, branch safety, and scheduling, the reference for the Mirror layer.
Every command, plus configuration, branch safety, and scheduling. New here? Start with QUICKSTART. For the knowledge layer see knowledge-layer.md.
Command reference#
contextlake sync runs the whole mirror pipeline end to end; each stage is also
available as its own command:
status - Check Current Synchronization Status#
Shows the current state of your workspace compared to GitLab.
contextlake status
Example output:
GitLab projects (cached): 128 # repos you can see on GitLab
Local repositories: 128 # repos cloned in your workspace
Synchronized: 127 # present in both and matching
Missing: 1 # on GitLab but not cloned yet
Extra: 1 # cloned locally but not on GitLab
- Missing = a repo exists on GitLab but isn't in your workspace,
clone(orsync) will fetch it. - Extra = a repo is in your workspace but not on GitLab, usually one that was
renamed, archived, or removed there;
contextlakeleaves it alone for you to review.
A fully synced workspace shows 0 for both.
fetch - Fetch All GitLab Projects#
Retrieves all repositories from the specified GitLab group and caches them locally.
contextlake fetch
This command:
- Uses the GitLab API with pagination to fetch all projects
- Includes subgroups automatically
- Skips archived repositories
- Caches results in
/tmp/gitlab_projects.txtand/tmp/gitlab_projects.json
clone - Clone Missing Repositories#
Clones any repositories that exist in GitLab but are missing locally.
contextlake clone
This command:
- Compares cached GitLab projects with local repositories
- Creates directory structure matching GitLab's group/subgroup hierarchy
- Uses HTTPS cloning for better authentication
- Clones up to 8 repositories concurrently
- Handles timeouts gracefully (300s per repository)
update - Update Existing Repositories#
Fetches and pulls the latest changes for all local repositories.
contextlake update
This command:
- Fetches all remote branches
- Updates the current branch with latest changes from origin
- Handles detached HEAD states appropriately
- Reports repositories that are already up to date
branches - Switch to Most Active Branches#
Analyzes all repositories and switches them to their most active development branch.
contextlake branches
This command:
- Fetches all remote branches for each repository
- Calculates commit count for each branch
- Identifies the branch with the most commits (most active)
- Switches to the most active branch if different from current
- Pulls latest changes after switching
Branch Selection Criteria:
- Primary: Commit count (more commits = more active development)
- Secondary: Latest commit date (used as tiebreaker)
- Skips: Archived repositories, repositories without branches, detached HEAD states
verify - Verify Repository Structure#
Checks that the local workspace structure matches GitLab exactly.
contextlake verify
This command:
- Compares local repositories with GitLab project list
- Identifies nested
.gitdirectory structures (indicates incorrect cloning) - Lists extra local repositories (not in GitLab)
- Lists missing repositories (in GitLab but not local)
- Reports synchronization status
sync - Full Synchronization#
Runs the complete synchronization pipeline in sequence.
contextlake sync
This command executes:
fetch- Get latest GitLab project listclone- Clone missing repositoriesupdate- Update existing repositoriesbranches- Switch to active branchesverify- Verify structureaudit- Report repo health & age (skip with--no-audit)
audit - Repo health & age report#
Scans every local clone and reports which repos are effectively empty and how old/active
they are. Runs automatically at the end of sync/bootstrap, or on demand:
contextlake audit # summary to console + report to <cache_dir>/repo_audit.json
contextlake audit --report ./audit.json # choose where the per-repo JSON + .csv are written
contextlake sync --no-audit # run sync without the audit step
It classifies each repo as empty (no commits/files), readme-only (just a template
README), boilerplate (only meta files), or content, and reports each repo's
creation date (GitLab created_at, captured during fetch; falls back to the first git
commit) and last commit date (from the local clone), with an aggregate summary
(counts, oldest/newest, how many stale over 1–2 years, repos with no commits). The full
per-repo table is written as JSON and CSV. The scan is parallel, read-only, and works
offline from the fetch cache.
Configuration#
Using Configuration Files#
The tool supports configuration files for persistent settings. Configuration is loaded in the following precedence order:
- Local config:
.contextlake.iniin the current directory (highest priority) - Global config:
~/.contextlake.iniin the home directory - Default values: Built-in defaults (lowest priority)
- CLI arguments: Override all other settings
Upgrading from
gitlab-sync? Your existing~/.gitlab_sync.ini/.gitlab_sync.ini(with its[gitlab_sync]section) is still read, and the knowledge store at~/.gitlab-sync/is reused as-is, nothing to migrate. New setups use.contextlake.iniand~/.contextlake/; thegitlab-synccommand also still works as a deprecated alias.
Example configuration file (.contextlake.ini):
[contextlake]
work_dir = ~/work
gitlab_group = your-gitlab-group
cache_dir = /tmp
clone_timeout = 300
fetch_timeout = 60
branch_timeout = 30
pull_timeout = 60
max_workers = 8
Custom Work Directory#
# Using config file (recommended)
# Edit .contextlake.ini and set work_dir
# Or override with CLI argument
contextlake --work-dir /path/to/workspace sync
Custom GitLab Group#
# Using config file (recommended)
# Edit .contextlake.ini and set gitlab_group
# Or override with CLI argument
contextlake --group my-gitlab-group sync
Combined Options#
contextlake --work-dir /home/user/dev --group your-gitlab-group status
Custom Config File#
contextlake --config /path/to/custom.ini sync
Settings reference#
| Setting | Description | Default | Example |
|---|---|---|---|
work_dir |
Working directory for repositories | ~/work |
/home/user/projects |
gitlab_group |
GitLab group to synchronize | your-gitlab-group |
mycompany-group |
cache_dir |
Directory for cache files | /tmp |
~/.cache/contextlake |
cache_file |
Name of projects cache file | gitlab_projects.txt |
projects.txt |
cache_json |
Name of JSON cache file | gitlab_projects.json |
projects.json |
clone_timeout |
Clone operation timeout (seconds) | 300 |
600 |
fetch_timeout |
Fetch operation timeout (seconds) | 60 |
120 |
branch_timeout |
Branch operation timeout (seconds) | 30 |
60 |
pull_timeout |
Pull operation timeout (seconds) | 60 |
120 |
max_workers |
Maximum parallel workers | 8 |
4 |
clean_corrupted |
Auto-remove corrupted directories | true |
false |
max_retries |
Maximum retry attempts for failed operations | 3 |
5 |
backoff_initial |
Initial backoff time in seconds | 1 |
2 |
backoff_max |
Maximum backoff time in seconds | 30 |
60 |
adaptive_workers |
Enable adaptive worker pool | true |
false |
min_workers |
Minimum workers for adaptive pool | 2 |
4 |
error_threshold |
Error rate threshold for adaptive workers | 0.5 |
0.3 |
The branch-safety settings (require_clean_workspace, protect_working_branches,
safe_branches, auto_stash) live in their own section below, see
Branch safety.
Branch Safety#
The tool protects your local work without getting in your way. The guiding rule:
a clean repo is always safe to act on, the branch name alone never causes a skip.
The only thing that blocks an update is a dirty working tree.
Safety Checks#
- Clean Workspace Check (the main guard): detects a dirty working tree,
uncommitted, unstaged, or untracked changes. A dirty repo is skipped by both
updateandbranchesso local work is never clobbered. - Automatic Stashing: optionally stashes a dirty tree so
updatecan proceed instead of skipping. - Working-Branch Protection (applies to
branchesonly): keeps thebranchescommand from switching a repo off a branch outsidesafe_branches, so you are never moved off a feature branch you are working on. This does not affectupdate, a clean feature branch is still pulled.
Configuration#
| Setting | Description | Default |
|---|---|---|
require_clean_workspace |
Skip repos with a dirty working tree (the main guard) | true |
protect_working_branches |
Keep branches from switching a repo off a non-safe branch |
true |
safe_branches |
Branches the branches command may switch away from |
main,master,develop,development |
auto_stash |
Stash a dirty tree before update instead of skipping |
false |
Behavior#
update (fetch + fast-forward the current branch):
- A clean repo is updated on whatever branch it is on, feature branches included.
- A repo with a dirty working tree is skipped (or stashed first, if
auto_stashis on).
branches (switch to the most active branch):
- A repo with a dirty working tree is skipped.
- With
protect_working_branches = true, a repo on a branch outsidesafe_branchesis left where it is instead of being switched away.
Example Scenarios#
Scenario 1: Working-Branch Protection (branches command)#
# Repository is on feature/my-feature branch (not in safe branches)
contextlake branches
# Output:
# [2026-06-16 10:00:00] ⊘ backend/services/api-gateway: Skipped branch switch (on working branch: feature/my-feature)
A plain
contextlake updatewould instead pullfeature/my-featurehere, since the working tree is clean.
Scenario 2: Uncommitted Changes#
# Repository has uncommitted changes
contextlake update
# Output:
# [2026-06-16 10:00:00] ⊘ backend/services/api-gateway: Skipped (unsafe: Uncommitted changes detected)
Scenario 3: Auto-Stash Enabled#
# Repository has uncommitted changes, auto-stash enabled
contextlake --auto-stash update
# Output:
# [2026-06-16 10:00:00] ⚠ backend/services/api-gateway: Changes stashed successfully
# [2026-06-16 10:00:00] ✓ backend/services/api-gateway: Updated main
Customization#
You can customize branch safety behavior via configuration or CLI:
# In .contextlake.ini
[contextlake]
protect_working_branches = true
safe_branches = main,master,develop,staging
require_clean_workspace = true
auto_stash = false
# Or via CLI
contextlake --safe-branches main,master,develop,staging --auto-stash update
Disabling Safety Checks#
If you want to disable safety checks (not recommended for production workflows):
# Disable all safety checks
contextlake --no-protect-working-branches --no-require-clean-workspace update
Warning: Disabling safety checks can lead to conflicts, lost work, or corruption of your local branches. Only disable if you understand the risks.
Scheduling & automation#
Prerequisites for Cron Jobs#
Before setting up cron jobs, ensure you have:
- Configuration file set up: Create
~/.contextlake.iniwith your settings
bash
cp .contextlake.ini ~/.contextlake.ini
# Edit with your work_dir and gitlab_group
nano ~/.contextlake.ini
- Absolute path to script: Cron requires absolute paths
bash
which python3 # Note the path
# Example: /usr/bin/python3
- Test the command manually first:
bash
cd /home/user/work && contextlake sync
Basic Daily Sync#
Run a full synchronization daily at 2 AM:
# Edit crontab
crontab -e
# Add the following line (replace paths as needed)
0 2 * * * cd /home/user/work && /usr/bin/contextlake sync >> /tmp/contextlake.log 2>&1
Note: This uses the configuration from ~/.contextlake.ini. No need to specify work_dir or gitlab_group in the cron command.
Hourly Updates (No Branch Switching)#
Update repositories hourly without changing branches (for CI/CD environments):
0 * * * * cd /home/user/work && /usr/bin/contextlake update >> /tmp/gitlab_hourly.log 2>&1
Weekly Full Sync with Branch Management#
Run full sync including branch switching weekly on Sunday at 3 AM:
0 3 * * 0 cd /home/user/work && /usr/bin/contextlake sync >> /tmp/gitlab_weekly.log 2>&1
Multiple Workspaces#
For multiple workspaces, use separate config files:
# Create workspace-specific config files
cat > ~/.contextlake_primary.ini << EOF
[contextlake]
work_dir = ~/work
gitlab_group = example-group-primary
EOF
cat > ~/.contextlake_secondary.ini << EOF
[contextlake]
work_dir = ~/Projects/Secondary
gitlab_group = example-group-secondary
EOF
# Add to crontab
# Sync primary workspace daily
0 2 * * * cd /home/user/work && /usr/bin/contextlake --config ~/.contextlake_primary.ini sync >> /tmp/gitlab_primary.log 2>&1
# Sync secondary workspace every 6 hours
0 */6 * * * cd /home/user/work && /usr/bin/contextlake --config ~/.contextlake_secondary.ini update >> /tmp/gitlab_secondary.log 2>&1
Monitoring and Alerts#
Add email notifications for failures:
# Create a wrapper script
cat > /home/user/scripts/contextlake_wrapper.sh << 'EOF'
#!/bin/bash
cd /home/user/work
contextlake sync >> /tmp/contextlake.log 2>&1
EXIT_CODE=$?
if [ $EXIT_CODE -ne 0 ]; then
echo "GitLab sync failed with exit code $EXIT_CODE" | mail -s "GitLab Sync Failure" [email protected]
fi
EOF
chmod +x /home/user/scripts/contextlake_wrapper.sh
# Add to crontab
0 2 * * * /home/user/scripts/contextlake_wrapper.sh
Log Rotation#
To prevent log files from growing indefinitely, set up log rotation:
# Create logrotate configuration
sudo cat > /etc/logrotate.d/contextlake << 'EOF'
/tmp/contextlake.log {
daily
rotate 7
compress
delaycompress
missingok
notifempty
create 0644 user user
}
EOF
Troubleshooting#
| Symptom | What to do |
|---|---|
| "Cache file not found" | Run contextlake fetch first to populate the projects cache. |
| "Permission denied" during cloning | Make sure glab is authenticated (glab auth login) and you can reach the repositories. |
| "Timeout" errors | Raise the relevant *_timeout settings, check connectivity, or lower max_workers (set it to 1 to run serially). Behind a TLS-inspecting proxy, set GITLAB_TOKEN so enumeration uses the built-in HTTP client. |
| "Detached HEAD" states | Handled automatically, the repo is skipped for pulls rather than failing. |
Nested .git directories |
A repo cloned into a subfolder of itself. contextlake verify flags it; fix by moving the inner tree up one level and removing the empty folder. |
| Cron job not running | Check crontab -l, use absolute paths, and test the exact command in a shell first; inspect cron logs (grep CRON /var/log/syslog). See Scheduling & automation. |
| Large log files | Set up log rotation, see Scheduling & automation. |
Best Practices#
- Initial Setup: Run
contextlake synconce to set up full workspace - Regular Updates: Use
contextlake updatefor frequent, fast updates - Branch Management: Run
contextlake branchesperiodically to stay on active branches - Monitoring: Check logs regularly for errors or failures
- Backup: Commit workspace state to git before major branch switches
- Testing: Test cron commands manually before adding to crontab
- Documentation: Keep this documentation updated with any custom configurations
