Technical Documentation

JobSync MCP — Complete Reference

Every tool, every parameter, every config field. A self-sufficient guide to deploying and operating the JobSync Model Context Protocol server for automated job search pipelines.

v1.x 20+ MCP tools Node.js · TypeScript ⚡ MCP Server

Overview

JobSync MCP is a Model Context Protocol server that gives Claude (or any MCP-compatible LLM client) direct, structured access to job-search infrastructure — ATS APIs, an application pipeline, a deduplication cache, Airtable, and your personal profile. Instead of asking Claude to "find me jobs," you run automated multi-step workflows entirely inside the model context window: fetch → classify → deduplicate → store → track.

The server exposes 23 tools across eight functional domains. Each tool has a recommended model hint (⚡ haiku, ⚖ sonnet, 🔬 opus) so orchestrators can route calls to the cheapest capable model. All persistent state lives under ~/.jobsync/ — no cloud services required beyond Airtable (which is optional if you use the markdown sink).

JobSync MCP is not a browser scraper. It calls public ATS job-board APIs (Greenhouse, Lever, Ashby, Workday) directly. No Playwright, no HTML parsing, no rate-limit circumvention.

Architecture at a glance

LLM Client
Claude / any MCP host
MCP stdio transport
jobsync-mcp server
ATS APIs / Airtable / SQLite / TSV

The server runs as a subprocess of the MCP host. All tool calls are synchronous JSON-over-stdio. The server reads ~/.jobsync/config.json on startup and validates required credentials based on the configured sink.

Installation & Configuration

Install

npm install -g jobsync-mcp

# Initialise config interactively
jobsync-mcp init

The init command creates ~/.jobsync/config.json with your Airtable credentials and preferred settings. You can also write the file manually — the schema is documented below.

Config fields — ~/.jobsync/config.json

Field Type Default Description
airtable.pat string Airtable Personal Access Token. Required when sink is airtable or both. Needs scopes: data.records:write, schema.bases:read. Add schema.bases:write if you want airtable_create_base.
airtable.baseId string Airtable base ID (starts with app…). Required with airtable sink.
airtable.tableName string "Jobs" Name of the table within the base where job records are upserted.
airtable.fieldMap object undefined Optional override mapping from JobSync field names to your Airtable column names. Useful if you renamed columns in an existing base.
sink "airtable" | "markdown" | "both" "airtable" Where jobs are written. markdown writes to markdownPath. both writes to both simultaneously.
markdownPath string ~/.jobsync/jobs.md Absolute path for markdown output. Only used when sink includes markdown.
lookbackHours number 12 How far back (in hours) the scrape workflow considers a job "recently seen" for dedup purposes. Increase if you run scrapes less frequently.
usOnly boolean true When true, the location filter rejects roles with clearly non-US locations. Set to false for remote-global or international searches.
enableFastPath boolean false Gate for the four ATS fetcher tools. Set to true to enable fetch_greenhouse_jobs, fetch_lever_jobs, fetch_ashby_jobs, and fetch_workday_jobs.
profileDir string ~/.jobsync/profile Directory containing skills.md, experience.md, projects.md, roles.json, and raw-resume.txt.
brandedOutput boolean true When true, tool responses include JobSync branding headers. Set to false for plain JSON output.
The server validates credentials on startup. If sink requires Airtable but airtable.pat or airtable.baseId are missing, the server will throw immediately — not at call time.

File system layout

~/.jobsync/
├── config.json # Server config (PAT, baseId, sink, …)
├── cache.db # SQLite dedup cache
├── pipeline.tsv # Application tracking (tab-separated)
├── portals.yml # Portal scanner config (search queries, companies)
├── jobs.md # Markdown sink output (if sink includes "markdown")
└── profile/
    ├── skills.md # Curated skill list
    ├── experience.md # Work history in markdown
    ├── projects.md # Notable projects
    ├── roles.json # Detected / custom / excluded job roles
    └── raw-resume.txt # Plaintext resume parsed from PDF/DOCX

Claude Desktop setup

Add the server to claude_desktop_config.json (macOS: ~/Library/Application Support/Claude/, Windows: %APPDATA%\Claude\):

{
  "mcpServers": {
    "jobsync": {
      "command": "npx",
      "args": ["-y", "jobsync-mcp"]
    }
  }
}

After saving, restart Claude Desktop. The server appears in the tool list as jobsync.

Profile & Resume Tools

The profile system stores your professional identity in structured markdown files. Claude reads these files to understand your background during fit-analysis and job-match scoring. You can bootstrap the profile by pasting your resume — the parser handles PDF, DOCX, TXT, and MD.

profile_read ⚡ haiku

Returns the full contents of every profile file in a single call. Use this at the start of a fit-analysis session so Claude has your background available in context.

Parameters

None.

Returns

FieldTypeDescription
skillsstringContents of skills.md
experiencestringContents of experience.md
projectsstringContents of projects.md
rolesobjectRaw roles.json — detected, custom, excluded arrays
activeRolesstring[]Computed: (detected ∪ custom) − excluded
rawResumestringContents of raw-resume.txt (empty string if absent)
profile_write_file ⚡ haiku

Overwrites one of the three profile markdown files. Use this after onboarding to save structured skill or experience content that Claude generated from your resume.

Parameters

ParamTypeDescription
file"skills"|"experience"|"projects"requiredWhich file to write.
contentstringrequiredFull markdown content to write. Overwrites existing file.
profile_update_roles 🔬 opus

Manages the three-way role system: detected (auto-inferred from resume), custom (user-added), and excluded (suppressed from searches). The activeRoles set used for scraping is (detected ∪ custom) − excluded.

You can either replace entire arrays (detected, custom, excluded) or perform incremental mutations (addCustom, addExcluded, removeCustom, removeExcluded). Incremental mutations are preferred for interactive editing.

Parameters (all optional, at least one required)

ParamTypeDescription
detectedstring[]Replace the full detected roles array (from resume parsing).
customstring[]Replace the full custom roles array.
excludedstring[]Replace the full excluded roles array.
addCustomstring[]Append to custom array without touching the rest.
addExcludedstring[]Append to excluded array.
removeCustomstring[]Remove specific entries from custom array.
removeExcludedstring[]Remove specific entries from excluded array.
profile_parse_resume ⚖ sonnet

Parses a resume file into plaintext and saves it to raw-resume.txt in the profile directory. Supports PDF, DOCX, TXT, and MD formats. After parsing, run profile_update_roles to extract role keywords from the raw text.

Parameters

ParamTypeDescription
pathstringrequiredAbsolute path to the resume file on the local filesystem.

Returns

{ success, savedPath, charCount } — path where raw-resume.txt was written and the character count of extracted text.

Portals Configuration Tools

The portals config (~/.jobsync/portals.yml) drives scrape workflows — it defines which companies to monitor and which search queries to run. These tools let Claude read, update, and initialise this file from a built-in master company list.

portals_read ⚡ haiku

Returns the raw YAML content of portals.yml plus an exists flag. If the file does not yet exist, exists is false and content is empty — use this to trigger onboarding.

Returns

FieldTypeDescription
contentstringRaw YAML string, or empty string if file absent.
existsbooleanWhether the file exists.
portals_write ⚡ haiku

Overwrites portals.yml with the provided YAML string. Used during onboarding after Claude selects relevant companies from the master list. Can also be called directly for manual edits.

Parameters

ParamTypeDescription
contentstringrequiredFull YAML content to write. Must not be empty.
portals_master_list ⚡ haiku

Returns the built-in YAML snippet covering ~60 companies across AI labs, developer tools, voice AI, SaaS, and fintech. During onboarding, Claude calls this, filters by the user's target roles, then passes the subset to portals_write.

Returns

FieldTypeDescription
masterCompanyListstringYAML snippet — all built-in companies with their ATS slugs.
pathstringTarget path where portals.yml would be written.

Portals YAML structure

search_queries:
  - "software engineer AI"
  - "backend engineer"

tracked_companies:
  - name: Anthropic
    ats: ashby
    slug: anthropic
  - name: OpenAI
    ats: greenhouse
    slug: openai
  - name: Vercel
    ats: lever
    slug: vercel

Each entry in tracked_companies maps directly to a fast-path fetcher call. The ats field must be one of greenhouse, lever, ashby, or workday.

ATS Fast-Path Fetchers

These four tools call public ATS job-board APIs directly — no browser, no scraping. They return structured job arrays ready for classification and dedup. All four require enableFastPath: true in config.

Fast-path tools are disabled by default. Set "enableFastPath": true in ~/.jobsync/config.json to enable them.
fetch_greenhouse_jobs ⚡ haiku

Fetches open roles from the Greenhouse job board API at boards-api.greenhouse.io/v1/boards/{slug}/jobs. Supports one slug or an array for batch fetching.

Parameters

ParamTypeDescription
slugstringSingle company slug (e.g. "openai").
slugsstring[]Array of slugs for batch fetching. Provide either slug or slugs.

Returns

{ total, jobs[] } where each job has: id, positionTitle, company, location, applyLink, datePosted, jobBoard: "greenhouse".

Notes

Link verification for Greenhouse uses the API (a 404 on /jobs/{id} means the role is closed) rather than a browser GET — faster and more reliable.

fetch_lever_jobs ⚡ haiku

Fetches postings from the Lever public API at api.lever.co/v0/postings/{slug}?mode=json.

Parameters

ParamTypeDescription
slugstringSingle company slug (e.g. "vercel").
slugsstring[]Array of slugs for batch fetching.

Returns

{ total, jobs[] } — each job: id, positionTitle, company, location, applyLink, datePosted, team, jobBoard: "lever".

Notes

Link verification checks the state field via the Lever API — a state of "closed" marks the link inactive.

fetch_ashby_jobs ⚡ haiku

Fetches postings from the Ashby HQ posting API at api.ashbyhq.com/posting-api/job-board/{slug}. Ashby is popular among AI-native companies (Anthropic, Perplexity, etc.).

Parameters

ParamTypeDescription
slugstringSingle Ashby company slug.
slugsstring[]Array of slugs for batch fetching.

Returns

{ total, jobs[] } — each job includes isListed field; only listed jobs are returned.

Known quirk: Ashby returns null for datePosted. The server stamps today's date as a fallback before upserting to Airtable. Do not rely on this date for recency filtering.
fetch_workday_jobs ⚡ haiku

Fetches jobs from Workday's internal CXS (Candidate Experience) API. Accepts the full board URL rather than a slug, since Workday tenants have unique subdomains and paths.

Parameters

ParamTypeDescription
boardUrlstringFull Workday board URL (e.g. "https://company.wd1.myworkdayjobs.com/en-US/External").
boardUrlsstring[]Array of board URLs for batch fetching.

Notes

Workday's datePosted is also unreliable from the API payload — use rawFields.postedOn instead for accurate post dates. Link verification uses a GET request combined with soft-close pattern matching in the response body.

Filters & Classification

These tools form the classification pipeline. Run them after fetching to strip irrelevant roles before touching the cache or Airtable. For most workflows, classify_job_batch is the single call you need — it composes all filters internally.

filter_us_location ⚡ haiku

Determines whether a location string is likely non-US. Uses pattern matching against known country/city patterns — does not make a network call.

Parameters

ParamTypeDescription
locationstringrequiredLocation string from job posting (e.g. "London, UK", "Remote").

Returns

{ likelyNonUS: boolean }

filter_title_keywords ⚡ haiku

Tests a job title (and optionally location) against include/exclude keyword lists. Returns passes: false if the title matches an exclude keyword or fails to match any include keyword.

Parameters

ParamTypeDescription
titlestringrequiredJob title to evaluate.
locationstringLocation string — used for US-only check when usOnly is set.
includestring[]Whitelist keywords. Title must match at least one. Case-insensitive.
excludestring[]Blacklist keywords. Title must match none. Case-insensitive.
usOnlybooleanWhen true, non-US locations also fail the filter.

Returns

{ passes: boolean, title, location }

detect_industry_tags ⚖ sonnet

Classifies a job into an industry and enriches it with tags. Uses a combination of heuristic pattern matching and LLM-assisted reasoning (hence sonnet). Also flags H1B sponsorship likelihood and identifies the originating job board.

Parameters

ParamTypeDescription
companystringrequiredCompany name.
positionTitlestringrequiredJob title.
applyLinkstringApply URL — used to infer ATS and job board.
jobDescriptionstringFull job description text for richer classification.

Returns

FieldTypeDescription
industrystringClassified industry (e.g. "AI / ML", "Fintech").
tagsstringComma-separated tags (e.g. "FAANG+,YC,Remote").
h1bSponsorbooleanEstimated H1B sponsorship likelihood.
jobBoardstringInferred originating job board.
industryOptionsstring[]Alternate industry classifications considered.
classify_job_batch ⚖ sonnet

The recommended single-call entry point for classification. Internally runs filter_us_location, filter_title_keywords, and detect_industry_tags for every job in the batch, returning a split of accepted vs. rejected with per-job reasoning.

Parameters

ParamTypeDescription
jobsobject[]requiredArray of job objects, each with at minimum positionTitle, company, applyLink. Optional: location, jobDescription.
includestring[]Title include keywords. Defaults to active roles from profile if omitted.
excludestring[]Title exclude keywords.
usOnlybooleanDefaults to config.usOnly.

Returns

FieldTypeDescription
totalnumberInput job count.
acceptednumberJobs that passed all filters.
rejectednumberJobs that failed at least one filter.
resultsobject[]Each result: { job, passes, rejectReason?, industry, tags }.

Cache Layer

The dedup cache is a SQLite database at ~/.jobsync/cache.db. Every processed job is stamped here by its apply link. On subsequent scrape runs, seen links are skipped before hitting any classification or storage logic — keeping downstream calls cheap.

cache_is_seen ⚡ haiku

Checks whether one or more apply links have already been processed. Accepts a single URL or a batch array.

Parameters

ParamTypeDescription
applyLinkstringSingle URL to check.
applyLinksstring[]Array of URLs to check in bulk.

Returns

Single: { seen: boolean, record? }record contains the cache entry if seen.
Batch: { results: Array<{ applyLink, seen, record? }> }

cache_mark_seen ⚡ haiku

Marks one or more jobs as seen in the cache. Call this after a job has been classified and stored — not before, so a mid-run crash doesn't permanently skip unprocessed jobs.

Parameters

ParamTypeDescription
jobsobject[]requiredEach entry: { applyLink, company, positionTitle }. All three are stored in the cache record.

Returns

{ marked: number, cacheSize: number }

cache_prune ⚡ haiku

Deletes cache entries older than days days. Run monthly or on a schedule to prevent the SQLite file from growing unbounded.

Parameters

ParamTypeDescription
daysnumberEntries older than this many days are deleted. Default: 90.

Returns

{ deleted: number, cacheSize: number }

Airtable Integration

Airtable is the default authoritative sink — the place where all passing jobs land after classification. The schema is fixed: the server creates exactly the columns it knows about and maps them consistently. You can inspect that schema at any time with airtable_get_schema.

airtable_upsert_job ⚖ sonnet

Creates or updates job records in Airtable. Uses applyLink as the dedup key — if a record with that link already exists it is updated in-place, otherwise a new record is created.

Parameters

ParamTypeDescription
jobsobject[]requiredArray of job objects. Required fields per job: positionTitle, company, applyLink. Optional: location, datePosted, industry, tags, fitScore, notes.

Returns

{ created: number, failed: number, rejected: number }rejected counts jobs that failed schema validation before even hitting the API.

Always call cache_mark_seen after a successful upsert batch, not before. This ensures that a failed upsert doesn't permanently skip a job on the next run.
airtable_list_recent_jobs ⚡ haiku

Fetches recently created Airtable records for dedup reconciliation. Use this when the SQLite cache has been cleared or on a new machine — pull existing Airtable records then re-seed the cache.

Parameters

ParamTypeDescription
lookbackDaysnumberHow many days back to fetch records. Default: 14.
maxRecordsnumberMaximum number of records to return. Default: 500.
airtable_get_schema ⚡ haiku

Returns the field map the server uses when writing to Airtable — useful for debugging column name mismatches or understanding the exact output format.

Parameters

None.

airtable_list_bases ⚡ haiku

Lists all Airtable bases accessible to the configured PAT. Useful during setup to find your base ID or to verify the PAT has the right permissions.

Parameters

None.

Returns

{ bases: Array<{ id, name, permissionLevel }> }

airtable_create_base ⚡ haiku

Creates a new Airtable base pre-configured with the JobSync schema (all required columns, correct field types). Optionally sets the new base as the active base in config. Requires the schema.bases:write PAT scope.

Parameters

ParamTypeDescription
workspaceIdstringrequiredAirtable workspace ID to create the base in.
namestringBase name. Defaults to "JobSync".
setAsActivebooleanIf true, writes the new baseId to config.json. Default: true.

Application Pipeline

The pipeline is a tab-separated file (~/.jobsync/pipeline.tsv) that tracks every job you're considering, have applied to, or are interviewing for. It's intentionally a flat file — easy to inspect, easy to back up, easy to query with a spreadsheet tool.

Pipeline statuses

Valid status values: pending · applied · interviewing · offer · rejected · withdrawn
pipeline_upsert_jobs ⚡ haiku

Adds new jobs to the pipeline or refreshes metadata for existing entries. The applyLink is the dedup key. For existing entries, status, appliedAt, and notes are preserved — only metadata fields (title, company, fit score, tags) are updated. Call this after fit analysis to persist all scored roles.

Parameters — jobs array items

FieldTypeDescription
positionTitlestringrequiredJob title.
companystringrequiredCompany name.
applyLinkstringrequiredPrimary dedup key.
idstringStable job ID from ATS scraper.
locationstring
datePostedstringYYYY-MM-DD format.
industrystring
tagsstringComma-separated tag string.
fitScorestringScore 1–10 as a string.
pipeline_mark_applied ⚡ haiku

Marks one or more jobs as applied by their exact apply links. Sets appliedAt to today's date (YYYY-MM-DD). Call this whenever the user confirms they have submitted an application.

Parameters

ParamTypeDescription
applyLinksstring[]requiredExact apply link URLs of submitted applications.
pipeline_update_status ⚡ haiku

Updates the status and/or notes of one or more pipeline entries. Identify jobs by applyLink (preferred) or id.

Parameters — updates array items

FieldTypeDescription
statusstringrequiredNew status value (see valid values above).
applyLinkstringPrimary lookup key.
idstringFallback if applyLink unavailable.
notesstringOptional free-text note (e.g. recruiter name, next steps).
pipeline_get_status ⚡ haiku

Reads the full pipeline and returns entries grouped by status with summary counts. Power this to a dashboard view or use it to decide where to focus next.

Parameters

ParamTypeDescription
statusFilterstring[]If provided, only return entries with these statuses. Omit for all entries.

Returns

{ summary: { pending, applied, interviewing, offer, rejected, withdrawn }, grouped: { [status]: job[] } }

Data Flow & Workflow Architecture

A complete scrape workflow — from cold start to Airtable record — follows this sequence:

1. Onboarding (first run)

profile_parse_resume
Extract raw resume text
profile_update_roles
Detect job roles from resume
portals_master_list
Get ~60 company list
portals_write
Save filtered company list to portals.yml
airtable_create_base
Create Airtable base (optional)

2. Scrape workflow (recurring)

portals_read
Load companies and search queries
fetch_*_jobs (per company)
Fetch raw job arrays from ATS APIs
cache_is_seen (bulk)
Filter out already-processed links
classify_job_batch
Apply title/location filters + tag detection
verify_job_link_batch
Confirm links are still active (optional)
airtable_upsert_job
Write accepted jobs to Airtable
pipeline_upsert_jobs
Add to local pipeline TSV
cache_mark_seen
Stamp processed links in SQLite

3. Application tracking (ongoing)

pipeline_get_status
Review pipeline dashboard
pipeline_mark_applied
Stamp applied date
pipeline_update_status
Track interview / offer / rejection

Model Hints

Every tool is tagged with a recommended model hint. MCP orchestrators that support model routing can use this to minimize cost without sacrificing accuracy.

HintModelUsed for
⚡ haiku claude-haiku-* Fast, deterministic operations: cache lookups, TSV reads/writes, ATS API fetches, simple YAML reads.
⚖ sonnet claude-sonnet-* Mid-complexity classification and write operations: industry detection, job batch classification, Airtable upserts, resume parsing.
🔬 opus claude-opus-* High-judgment tasks: role extraction and management where nuanced understanding of career trajectories matters.
Model hints are advisory — the MCP server does not enforce them. They are included in tool descriptions as ⚡ [Model hint: haiku] text so orchestrators can parse and act on them.

Frequently Asked Questions

Do I need Airtable? Can I use the markdown sink instead?
Yes. Set "sink": "markdown" in config and point markdownPath to any file. The Airtable PAT and baseId are only validated when sink is "airtable" or "both". The application pipeline (pipeline.tsv) works independently of the sink setting.
Why are fast-path fetchers disabled by default?
ATS API calls generate real HTTP traffic to company infrastructure. Enabling them by default on installation would cause surprise outbound requests. Set "enableFastPath": true explicitly once you understand what will be called — then the four fetcher tools appear in Claude's tool list.
Ashby returns null for datePosted. What date gets stored?
The server stamps today's date (ISO YYYY-MM-DD) as a fallback before writing to Airtable. This means Ashby-sourced jobs will always appear "posted today." Do not use datePosted for recency filtering on Ashby jobs — use the Airtable createdTime field instead.
How does the dedup cache interact with Airtable?
They operate independently. The SQLite cache tracks which applyLink URLs have been processed (fetched, classified, attempted upsert). Airtable dedup is based on whether a record with that applyLink already exists in the table. If you clear the cache on a new machine, use airtable_list_recent_jobs to pull existing records and re-seed the cache with cache_mark_seen.
What happens if an ATS company slug returns a 404?
The fetcher returns an error result for that slug and continues with others in the batch. A 404 on a Greenhouse or Lever slug usually means the company doesn't use that ATS or has changed their board URL. Check the portals_master_list notes — some companies have specific Workday board URLs that must be used instead.
Can I run multiple scrape sessions concurrently?
Technically yes, but SQLite write contention on cache.db and concurrent TSV writes to pipeline.tsv can cause race conditions. Run scrape sessions sequentially. The MCP stdio transport is single-tenant by design — one Claude session per server process.
How do I add a company that isn't in the master list?
Edit portals.yml directly (or use portals_write) and add an entry under tracked_companies with the correct ats and slug. For Workday companies, use the boardUrl field instead of slug since Workday tenants don't follow a uniform slug pattern.
The server throws on startup about missing airtable.pat. How do I fix it?
Either add your Airtable PAT and baseId to ~/.jobsync/config.json, or change "sink" to "markdown" if you don't want Airtable. Run jobsync-mcp init to re-run the interactive setup, which will walk you through both options.
How do I score jobs for fit before applying?
Call profile_read to load your background, then run each job (or a batch) through classify_job_batch which returns enriched results. Have Claude add a fitScore (1–10) field based on your skills and experience, then call pipeline_upsert_jobs with the scored results. The score is stored in the pipeline TSV and forwarded to Airtable on upsert.
Is my resume or profile data sent anywhere?
Profile files live exclusively at ~/.jobsync/profile/ on your local machine. They are read into the Claude context window when you call profile_read, which means the contents are sent to Anthropic's API as part of your conversation — the same as any other text you share with Claude. No other third party receives them.