High-Level Design (HLD) — Job Sentinel¶
Version: 2.0
Author: Harshit Wandhare
Last updated: 2026
1. Purpose¶
Job Sentinel is a local-first career platform that monitors university job portals, aggregates listings from public job APIs, tracks the full application lifecycle, and generates tailored ATS résumés and cover letters — all on the user's own machine. The original design (portal scraping → Telegram alerts) is preserved and extended; new surfaces (web UI, job-source search, application tracker, AI match) are additive layers over the same core.
It is designed to run indefinitely on a personal Windows PC / WSL2 without cloud infrastructure.
2. Goals & Non-Goals¶
Goals¶
- Real-time (≤15 min latency) alerts for new job postings from university portals
- Site-agnostic portal scraping — adding a portal requires no core changes
- Pluggable job-source layer — aggregate results from public APIs without a browser
- AI profile↔job match — blended keyword + semantic + optional LLM rationale
- Multi-provider LLM abstraction — Ollama local or any OpenAI-compatible cloud provider
- Application lifecycle tracking (saved → interviewing → offer/rejected)
- Generated-document library with ATS scores
- Full CLI/API/Web parity — every feature reachable three ways
- Zero cloud dependencies for the default path (PC / WSL2, Ollama)
- Open-source, portfolio-quality codebase
Non-Goals¶
- Auto-submission of applications (legal/ToS risk)
- Multi-user SaaS hosting (single-user local deployment is the target)
3. System Components¶
┌──────────────────────────────────────────────────────────────────────────────┐
│ Job Sentinel Process │
│ │
│ ┌────────────────────────────────────────────────────────────────────────┐ │
│ │ Main Thread — asyncio event loop (bot mode) │ │
│ │ python-telegram-bot run_polling() │ │
│ │ Handles: /jobs /recent /applied /ignore /status /stats /deadlines │ │
│ └────────────────────────────────────────────────────────────────────────┘ │
│ │
│ ┌────────────────────────────────────────────────────────────────────────┐ │
│ │ Background Thread — APScheduler (1 worker, max_instances=1) │ │
│ │ Scheduler._scrape_cycle() │ │
│ │ ├─ SiteAdapter (Playwright/Chromium) → list[JobPosting] │ │
│ │ ├─ JobRepository.save_job() + closed detection │ │
│ │ └─ TelegramNotifier + EmailNotifier (on new jobs) │ │
│ └────────────────────────────────────────────────────────────────────────┘ │
│ │
│ ┌────────────────────────────────────────────────────────────────────────┐ │
│ │ FastAPI (job-sentinel serve / web) — localhost only │ │
│ │ Profile CRUD · Jobs · Match · Applications · Documents │ │
│ │ Sources (search/company/config) · LLM config/test · Ops/auth │ │
│ └────────────────────────────────────────────────────────────────────────┘ │
│ │
│ ┌────────────────────────────────────────────────────────────────────────┐ │
│ │ Job Source Layer (sources/) — HTTP/JSON, no browser │ │
│ │ No-key: RemoteOK · The Muse · Arbeitnow · Himalayas │ │
│ │ Keyed: Adzuna · USAJobs │ │
│ │ Scraper: JobSpy (opt-in) │ │
│ │ Company boards: Greenhouse · Lever · Ashby │ │
│ └────────────────────────────────────────────────────────────────────────┘ │
│ │
│ ┌────────────────────────────────────────────────────────────────────────┐ │
│ │ LLM Provider Layer (documents/providers.py) │ │
│ │ ChatBackend / EmbedBackend protocols │ │
│ │ OllamaBackend (native /api/chat, think:false) │ │
│ │ OpenAICompatClient (OpenAI · OpenRouter · Groq · Gemini · custom) │ │
│ │ Factories: build_chat_backend / build_embed_backend │ │
│ └────────────────────────────────────────────────────────────────────────┘ │
└──────────────────────────────────────────────────────────────────────────────┘
│ │
▼ ▼
SQLite jobs.db External services
(job_postings, applications, Telegram Bot API
generated_documents) Optional: cloud LLM APIs
4. Data Flows¶
4a. Portal scrape → alert¶
Portal Website
│ HTTP (Playwright/Chromium)
▼
SiteAdapter.scrape() → list[JobPosting]
│
Scheduler._scrape_cycle()
├── keyword filter
├── JobRepository.save_job() (is_new? → new_jobs list)
├── closed detection (mark missing postings CLOSED)
└── TelegramNotifier + EmailNotifier (on new_jobs, unless dry_run)
4b. Public API job search (sources layer)¶
User (CLI / Web UI)
│ keywords, location, filters
▼
aggregate_search(query, sources)
│ fan-out, concurrent per-source search()
▼
list[JobPosting] + list[SourceError] (ephemeral — not written to DB)
4c. AI profile↔job match¶
Profile (profile.yaml) + Job description (text or posting_id)
│
match_profile_to_job()
├── 1. KeywordTailor → coverage score + matched/missing keywords
├── 2. EmbedBackend → cosine similarity (skipped if unavailable)
├── 3. blend: score = 0.5×coverage + 0.5×semantic (or =coverage alone)
└── 4. ChatBackend → grounded rationale (fallback: deterministic text)
▼
MatchResult (score, verdict, rationale, strengths, gaps)
4d. Application lifecycle¶
User action (CLI / Web UI)
├── apps add / POST /api/applications
│ → Application row (stage=saved/applied/…)
└── apps stage / PATCH /api/applications/{id}
→ update stage, notes, applied_date, …
5. Component Responsibilities¶
| Component | Responsibility |
|---|---|
SiteAdapter |
Login + HTML/API scraping via Playwright |
Scheduler |
Interval timing, cycle orchestration, filter application |
JobRepository |
Upsert / query SQLite (sqlite-utils); schema v2 — job_postings, applications, generated_documents |
TelegramNotifier |
Format MarkdownV2 messages, HTTP POST to Bot API |
BotHandlers |
Async command handlers, user interaction |
Settings |
pydantic-settings config, type-safe env loading (CHAT_/EMBED_ + legacy OLLAMA_*) |
JobSource |
HTTP/JSON search against a public job API; no browser, no DB write |
aggregate_search |
Fan out a JobQuery to a list of JobSource instances; merge results; capture per-source errors |
ChatBackend / EmbedBackend |
Protocol types; OllamaBackend (native) and OpenAICompatClient implement both |
build_chat_backend / build_embed_backend |
Factories that read LLMSettings and return the right implementation |
match_profile_to_job |
Blend ATS keyword coverage + embedding cosine similarity + optional LLM rationale into a MatchResult |
| FastAPI app | Local HTTP layer; all business logic delegated to core; UI/CLI parity |
| Next.js web UI | Consumer of the FastAPI layer; demo mode (NEXT_PUBLIC_DEMO=1) for hosted preview |
6. Concurrency Model¶
Main thread Background thread (APScheduler)
───────────── ──────────────────────────────
asyncio loop Playwright (blocking I/O)
└── bot polling └── Chromium browser
└── command └── HTTP requests (httpx sync)
handlers
│
│ thread-safe reads (SQLite WAL)
▼
JobRepository (shared)
SQLite WAL mode allows concurrent reads from the bot thread while the scheduler thread is writing. No additional locking is needed.
7. Failure Modes & Recovery¶
| Failure | Behaviour |
|---|---|
| Portal unreachable | Scrape returns empty list; logged; retried next interval |
| CAS login fails | Exception caught; logged; next cycle retries |
| Telegram API down | tenacity retries 3× with exponential backoff; logged |
| PC sleep/hibernate | APScheduler detects missed fire; runs immediately on wake |
| DB corruption | sqlite-utils raises; logged; next insert creates fresh DB |
8. Security Considerations¶
- Credentials stored only in local
.env— never logged, never sent except to portal gitleakspre-commit hook blocks accidental credential commits- Telegram Bot API uses HTTPS only
.envis in.gitignorewithdetect-private-keyhook as extra guard- No credentials in logs (loguru's
diagnose=Falsein production log files)
9. Technology Choices¶
| Decision | Choice | Rationale |
|---|---|---|
| Package manager | uv |
10-100× faster than pip/poetry; replaces pyenv |
| Config | pydantic-settings |
Type-safe, validates at startup, no scattered os.getenv |
| Logging | loguru |
Zero-config, JSON mode, rotating files in one line |
| Browser | Playwright (Chromium) | Handles SPAs, handles CAS SSO, cross-platform |
| DB | sqlite-utils + SQLite |
Zero infra, WAL-safe, ergonomic API |
| Scheduler | apscheduler |
Handles sleep/hibernate miss-fires; cron-capable |
| HTTP | httpx |
Async-capable, HTTP/2, replaces requests; used by job sources too |
| Retry | tenacity |
Declarative retry with exponential backoff |
| CLI | typer + rich |
Beautiful auto-help, zero boilerplate |
| Bot | python-telegram-bot v21 |
Async, Bot API 7.x, actively maintained |
| Linting | ruff |
Replaces black+flake8+isort; 100× faster |
| LLM abstraction | documents/providers.py |
Protocol-based; OllamaBackend (native) + OpenAICompatClient cover all providers without the openai SDK |
| Web UI | Next.js 15 / React 19 / Tailwind | App Router; lib/api.ts typed client; vitest; demo mode via NEXT_PUBLIC_DEMO |
See docs/adr/ for individual Architecture Decision Records, including ADR 005 (job source layer) and ADR 006 (AI personalisation and data strategy).
10. Future Evolution¶
v1 (done) v2 (done) v3 (planned)
────────── ───────────────────────── ──────────────────────────────────
Local PC → Docker / VPS → Packaged installers + PyPI publish
Telegram → + Discord/Email → Discord webhook notifier
SQLite → schema v2 (apps/docs) → deeper analytics
Manual → AI match + BYO-LLM → ghost-job signals, ATS simulation
Portal → + 7+ public sources → more portal adapters (Workday…)