ArgoHarvest: Why I Built a Job Scraper

Job searching in 2026 is broken. You know it. I know it. Everyone who’s applied to 200 jobs and gotten 3 responses knows it.

The boards are flooded with ghost postings — listings that stay up for months with no intention of filling them. Recruiters repost the same role every two weeks to keep it looking fresh. Job descriptions are keyword-stuffed nightmares that require 10 years of experience in a 5-year-old technology. And if you want to search across multiple boards? Good luck. LinkedIn shows you one thing, Indeed shows you another, and neither lets you filter by what actually matters.

I got tired of manually checking five sites every morning, copying job links into a spreadsheet, and trying to remember which ones I’d already applied to. I wanted one tool that:

  1. Scraped all the major boards automatically
  2. Filtered out the noise (clearance-required jobs, reposted listings, keyword bait)
  3. Scored each listing based on my priorities (tech stack, remote-friendly, salary range)
  4. Gave me a clean feed of jobs actually worth applying to

So I forked python-jobspy, renamed it ArgoHarvest, and started building.


The Series at a Glance

PartWhat It CoversKey Theme
Part 1: Why (this page)The job search problem and why scraping is the answerWhen the tools don’t work, build your own
Part 2: Scraper ArchitectureLinkedIn, Indeed, Glassdoor — how each one worksEvery site is a different puzzle
Part 3: The Anti-Bot Arms RaceTLS fingerprinting, residential proxies, detection evasionThey don’t want you scraping. Too bad.
Part 4: The Scoring SystemPersonalized job ranking based on your prioritiesFinding signal in the noise
Part 5: Deployment & What’s NextRunning it, scheduling it, where it’s headedFrom script to daily driver

The Problem With Job Boards

Let’s talk about what’s actually wrong.

Ghost postings are everywhere. Companies post jobs they’ve already filled internally, or jobs they’re not ready to fill, or jobs that are just there to “build a pipeline.” You apply, you hear nothing, the listing stays up for three more months.

Keyword gaming is rampant. Job descriptions are written for ATS systems, not humans. They list every technology under the sun to cast the widest net, making it impossible to tell what the role actually involves.

Cross-board search doesn’t exist. LinkedIn, Indeed, Glassdoor, ZipRecruiter — they’re all siloed. Each has different search syntax, different filters, different results for the same query. Comparing across boards means opening five tabs and doing manual deduplication.

No personalized ranking. Every board shows you the same results as everyone else. There’s no way to say “I care about Python and remote work, I don’t care about Java or on-site, and anything requiring a clearance is an automatic no.”

What ArgoHarvest Does

ArgoHarvest is a Python tool that scrapes multiple job boards, deduplicates listings, applies your personal scoring criteria, and outputs a ranked feed of jobs that actually match what you’re looking for.

The core loop is simple:

Configure search → Scrape boards → Deduplicate → Score → Export

You define your search parameters (keywords, location, remote preference), your scoring weights (tech stack, salary, company size), and your filters (no clearance, no contract-to-hire). ArgoHarvest runs the searches, collects the results, scores each one, and gives you a sorted CSV or JSON feed.

I run it every morning at 6 AM via cron. By the time I have coffee, I have a ranked list of new listings from the last 24 hours.

The Fork: Why python-jobspy?

I didn’t write a scraper from scratch. python-jobspy already had the hard part done — the actual HTTP request logic for each board, the HTML parsing, the result normalization. It’s a solid project.

But it had gaps:

  • No proxy rotation — LinkedIn and Indeed rate-limit aggressively. You need residential proxies.
  • No TLS fingerprint management — Modern anti-bot systems don’t just check your IP. They check your TLS handshake. Python’s requests library has a distinctive TLS fingerprint that screams “bot.”
  • No scoring system — It gave you raw results. Making them useful was your problem.
  • Bugs — Several scrapers had silent failures where they’d return empty results instead of raising errors.

ArgoHarvest fixes all of that. It’s not a wrapper around python-jobspy — it’s a hardened fork with proxy rotation, TLS fingerprint spoofing, a configurable scoring engine, and bug fixes that make the scrapers actually reliable.

The Stack

  • Python 3.11+ — Because that’s what python-jobspy uses and I’m not rewriting it in Rust (yet)
  • httpx — Async HTTP client with better TLS control than requests
  • Residential proxies — Rotating IPs from a proxy provider (more on this in Part 3)
  • curl_cffi — For TLS fingerprint impersonation (makes Python look like Chrome to anti-bot systems)
  • SQLite — Local storage for deduplication and history
  • Cron — Daily scheduling. Nothing fancy needed.

What I Learned Building It

The biggest surprise wasn’t the technical stuff. It was discovering how much effort job boards put into preventing exactly what I’m doing. LinkedIn has a whole team dedicated to detecting and blocking scrapers. Indeed uses Cloudflare’s bot detection. Glassdoor requires authentication for most data.

The anti-bot arms race is a rabbit hole. TLS fingerprinting alone is a fascinating topic — the idea that your encryption handshake identifies you as a bot before you even send an HTTP request. Part 3 is entirely about this.

But the thing that made ArgoHarvest actually useful wasn’t the scraping — it was the scoring. Turning 200 raw listings into 15 worth applying to. That’s where the value is.


Next up: Part 2 — Scraper Architecture — How LinkedIn, Indeed, and Glassdoor each present unique challenges, and how ArgoHarvest handles all of them.