High-Level Design (HLD) — Job Sentinel¶

Version: 2.0
Author: Harshit Wandhare
Last updated: 2026

1. Purpose¶

Job Sentinel is a local-first career platform that monitors university job portals, aggregates listings from public job APIs, tracks the full application lifecycle, and generates tailored ATS résumés and cover letters — all on the user's own machine. The original design (portal scraping → Telegram alerts) is preserved and extended; new surfaces (web UI, job-source search, application tracker, AI match) are additive layers over the same core.

It is designed to run indefinitely on a personal Windows PC / WSL2 without cloud infrastructure.

2. Goals & Non-Goals¶

Goals¶

Real-time (≤15 min latency) alerts for new job postings from university portals
Site-agnostic portal scraping — adding a portal requires no core changes
Pluggable job-source layer — aggregate results from public APIs without a browser
AI profile↔job match — blended keyword + semantic + optional LLM rationale
Multi-provider LLM abstraction — Ollama local or any OpenAI-compatible cloud provider
Application lifecycle tracking (saved → interviewing → offer/rejected)
Generated-document library with ATS scores
Full CLI/API/Web parity — every feature reachable three ways
Zero cloud dependencies for the default path (PC / WSL2, Ollama)
Open-source, portfolio-quality codebase

Non-Goals¶

Auto-submission of applications (legal/ToS risk)
Multi-user SaaS hosting (single-user local deployment is the target)

3. System Components¶

┌──────────────────────────────────────────────────────────────────────────────┐
│                              Job Sentinel Process                            │
│                                                                              │
│  ┌────────────────────────────────────────────────────────────────────────┐  │
│  │  Main Thread — asyncio event loop (bot mode)                          │  │
│  │     python-telegram-bot run_polling()                                 │  │
│  │     Handles: /jobs /recent /applied /ignore /status /stats /deadlines │  │
│  └────────────────────────────────────────────────────────────────────────┘  │
│                                                                              │
│  ┌────────────────────────────────────────────────────────────────────────┐  │
│  │  Background Thread — APScheduler (1 worker, max_instances=1)          │  │
│  │     Scheduler._scrape_cycle()                                         │  │
│  │       ├─ SiteAdapter (Playwright/Chromium) → list[JobPosting]         │  │
│  │       ├─ JobRepository.save_job() + closed detection                  │  │
│  │       └─ TelegramNotifier + EmailNotifier (on new jobs)               │  │
│  └────────────────────────────────────────────────────────────────────────┘  │
│                                                                              │
│  ┌────────────────────────────────────────────────────────────────────────┐  │
│  │  FastAPI (job-sentinel serve / web) — localhost only                  │  │
│  │     Profile CRUD · Jobs · Match · Applications · Documents            │  │
│  │     Sources (search/company/config) · LLM config/test · Ops/auth      │  │
│  └────────────────────────────────────────────────────────────────────────┘  │
│                                                                              │
│  ┌────────────────────────────────────────────────────────────────────────┐  │
│  │  Job Source Layer (sources/) — HTTP/JSON, no browser                  │  │
│  │     No-key: RemoteOK · The Muse · Arbeitnow · Himalayas               │  │
│  │     Keyed:  Adzuna · USAJobs                                          │  │
│  │     Scraper: JobSpy (opt-in)                                          │  │
│  │     Company boards: Greenhouse · Lever · Ashby                        │  │
│  └────────────────────────────────────────────────────────────────────────┘  │
│                                                                              │
│  ┌────────────────────────────────────────────────────────────────────────┐  │
│  │  LLM Provider Layer (documents/providers.py)                          │  │
│  │     ChatBackend / EmbedBackend protocols                              │  │
│  │     OllamaBackend  (native /api/chat, think:false)                    │  │
│  │     OpenAICompatClient (OpenAI · OpenRouter · Groq · Gemini · custom) │  │
│  │     Factories: build_chat_backend / build_embed_backend               │  │
│  └────────────────────────────────────────────────────────────────────────┘  │
└──────────────────────────────────────────────────────────────────────────────┘
         │                                   │
         ▼                                   ▼
  SQLite jobs.db                      External services
  (job_postings, applications,        Telegram Bot API
   generated_documents)               Optional: cloud LLM APIs

4. Data Flows¶

4a. Portal scrape → alert¶

Portal Website
     │  HTTP (Playwright/Chromium)
     ▼
SiteAdapter.scrape()  →  list[JobPosting]
     │
Scheduler._scrape_cycle()
     ├── keyword filter
     ├── JobRepository.save_job()  (is_new? → new_jobs list)
     ├── closed detection  (mark missing postings CLOSED)
     └── TelegramNotifier + EmailNotifier (on new_jobs, unless dry_run)

4b. Public API job search (sources layer)¶

User (CLI / Web UI)
     │  keywords, location, filters
     ▼
aggregate_search(query, sources)
     │  fan-out, concurrent per-source search()
     ▼
list[JobPosting] + list[SourceError]  (ephemeral — not written to DB)

4c. AI profile↔job match¶

Profile (profile.yaml)  +  Job description (text or posting_id)
     │
match_profile_to_job()
     ├── 1. KeywordTailor → coverage score + matched/missing keywords
     ├── 2. EmbedBackend  → cosine similarity (skipped if unavailable)
     ├── 3. blend:  score = 0.5×coverage + 0.5×semantic  (or =coverage alone)
     └── 4. ChatBackend  → grounded rationale (fallback: deterministic text)
          ▼
     MatchResult (score, verdict, rationale, strengths, gaps)

4d. Application lifecycle¶

User action (CLI / Web UI)
     ├── apps add / POST /api/applications
     │       → Application row (stage=saved/applied/…)
     └── apps stage / PATCH /api/applications/{id}
             → update stage, notes, applied_date, …

5. Component Responsibilities¶

Component	Responsibility
`SiteAdapter`	Login + HTML/API scraping via Playwright
`Scheduler`	Interval timing, cycle orchestration, filter application
`JobRepository`	Upsert / query SQLite (sqlite-utils); schema v2 — job_postings, applications, generated_documents
`TelegramNotifier`	Format MarkdownV2 messages, HTTP POST to Bot API
`BotHandlers`	Async command handlers, user interaction
`Settings`	pydantic-settings config, type-safe env loading (CHAT_/EMBED_ + legacy OLLAMA_*)
`JobSource`	HTTP/JSON search against a public job API; no browser, no DB write
`aggregate_search`	Fan out a `JobQuery` to a list of `JobSource` instances; merge results; capture per-source errors
`ChatBackend` / `EmbedBackend`	Protocol types; `OllamaBackend` (native) and `OpenAICompatClient` implement both
`build_chat_backend` / `build_embed_backend`	Factories that read `LLMSettings` and return the right implementation
`match_profile_to_job`	Blend ATS keyword coverage + embedding cosine similarity + optional LLM rationale into a `MatchResult`
FastAPI app	Local HTTP layer; all business logic delegated to core; UI/CLI parity
Next.js web UI	Consumer of the FastAPI layer; demo mode (`NEXT_PUBLIC_DEMO=1`) for hosted preview

6. Concurrency Model¶

Main thread         Background thread (APScheduler)
─────────────       ──────────────────────────────
asyncio loop        Playwright (blocking I/O)
  └── bot polling     └── Chromium browser
  └── command         └── HTTP requests (httpx sync)
      handlers
          │
          │  thread-safe reads (SQLite WAL)
          ▼
       JobRepository (shared)

SQLite WAL mode allows concurrent reads from the bot thread while the scheduler thread is writing. No additional locking is needed.

7. Failure Modes & Recovery¶

Failure	Behaviour
Portal unreachable	Scrape returns empty list; logged; retried next interval
CAS login fails	Exception caught; logged; next cycle retries
Telegram API down	tenacity retries 3× with exponential backoff; logged
PC sleep/hibernate	APScheduler detects missed fire; runs immediately on wake
DB corruption	sqlite-utils raises; logged; next insert creates fresh DB

8. Security Considerations¶

Credentials stored only in local .env — never logged, never sent except to portal
gitleaks pre-commit hook blocks accidental credential commits
Telegram Bot API uses HTTPS only
.env is in .gitignore with detect-private-key hook as extra guard
No credentials in logs (loguru's diagnose=False in production log files)

9. Technology Choices¶

Decision	Choice	Rationale
Package manager	`uv`	10-100× faster than pip/poetry; replaces pyenv
Config	`pydantic-settings`	Type-safe, validates at startup, no scattered `os.getenv`
Logging	`loguru`	Zero-config, JSON mode, rotating files in one line
Browser	Playwright (Chromium)	Handles SPAs, handles CAS SSO, cross-platform
DB	`sqlite-utils` + SQLite	Zero infra, WAL-safe, ergonomic API
Scheduler	`apscheduler`	Handles sleep/hibernate miss-fires; cron-capable
HTTP	`httpx`	Async-capable, HTTP/2, replaces requests; used by job sources too
Retry	`tenacity`	Declarative retry with exponential backoff
CLI	`typer` + `rich`	Beautiful auto-help, zero boilerplate
Bot	`python-telegram-bot` v21	Async, Bot API 7.x, actively maintained
Linting	`ruff`	Replaces black+flake8+isort; 100× faster
LLM abstraction	`documents/providers.py`	Protocol-based; OllamaBackend (native) + OpenAICompatClient cover all providers without the openai SDK
Web UI	Next.js 15 / React 19 / Tailwind	App Router; `lib/api.ts` typed client; vitest; demo mode via `NEXT_PUBLIC_DEMO`

See docs/adr/ for individual Architecture Decision Records, including ADR 005 (job source layer) and ADR 006 (AI personalisation and data strategy).

10. Future Evolution¶

v1 (done)     v2 (done)                   v3 (planned)
──────────    ─────────────────────────   ──────────────────────────────────
Local PC  →   Docker / VPS            →   Packaged installers + PyPI publish
Telegram  →   + Discord/Email         →   Discord webhook notifier
SQLite    →   schema v2 (apps/docs)   →   deeper analytics
Manual    →   AI match + BYO-LLM     →   ghost-job signals, ATS simulation
Portal    →   + 7+ public sources     →   more portal adapters (Workday…)