Architecture
› concurrency under I/O wait — not raw speed
production gotcha
› requests.get() inside async kills the event loop
› orchestrators only — no business logic, no direct DB
scales badly
› fine at 3 endpoints, disaster at 30
› schemas → deps → routers → services
› testable without HTTP stack — pure Python
› push sync code to thread pool — unblocks event loop
Schemas — Pydantic V2
› Create / Update / Response — never mix them
40+ entities
› 120+ schema classes — cognitive overhead becomes real
› PatientBase → PatientCreate / PatientResponse
› bridges ORM objects to Pydantic — without it: 500
› only fields client sent — prevents null overwrite
› single-field rules — @classmethod, raises ValueError
› cross-field rules — start_date before end_date
production gotcha
› client sends null → column NOT NULL → DB write fails
› safe update — never OrmModel(**dict) in PATCH
Routing
› never put routes directly on app
ordering bug
› /me before /{id} — registration order matters
silent failure
› "me" passed as patient_id → nonsense 404
› anchor at lowest owner that makes auth sense
› /patients/{id}/appointments — DB ownership in URL
3-level nesting
› /patients/{id}/appts/{id}/notes — no global URL
HTTP Status Codes
› 201 POST created · 204 DELETE no content
› 422 = wrong shape · 400 = wrong meaning
› 401 = who are you · 403 = I know, you can't
403 leaks existence
› return 404 instead of 403 to prevent resource enumeration
common mistake
› never raise 422 for business rule violations
› DB/Redis unreachable — infrastructure, not your bug
Dependency Injection
› commit on success · rollback on exception
production gotcha
› skip refresh → DetachedInstanceError in production
› same get_db twice = one session — not two transactions
› get_db → get_current_user → require_admin chain
› declarative at route level — beats g/current_app
› swap get_db in tests without touching route code
› opt-in per route — no exclusion lists needed
Error Handling
› route inline · service domain · global handler
› PatientNotFoundError — zero HTTP knowledge in service
coupling bug
› service coupled to HTTP — breaks gRPC/CLI/WebSocket
› last resort — log traceback, return generic 500
wrong twice
› exposes internals AND returns 200 status
silent failure
› asyncio.create_task() exceptions silently dropped
Lifespan
› replaces @app.on_event — startup/shutdown with yield
production gotcha
› worker restarts leave ghost connections → pool exhausted
› init in dependency order — raise on failure, don't swallow
› 30s default — shutdown must complete within window
› shared HTTP client, Redis — accessible anywhere
Middleware
› logging · CORS · request ID · timing — never auth
ordering trap
› last registered = outermost = runs first on request
exclusion list drift
› exclusion list drifts — new public route silently requires auth
› response body is a stream — can't read without buffering
› MW=every req, runs on 404s · Dep=selected routes
Health Checks
› SELECT 1 · Redis ping — never just return 200
› /health/live = process alive · /health/ready = deps ok
mass pod restart
› one check checks DB → DB hiccup → all pods restart
› diagnostic info without exposing DB connection strings
lies to infra
› load balancer routes traffic to dead pod
pydantic-settings
› BaseSettings — typed env vars, crash on startup if missing
› missing env var crashes import — not mid-request
credential leak
› silently falls back to dev credential in production
› enables dependency_overrides in tests
test isolation issue
› module-level Settings() can't swap env vars between tests
JWT Authentication
› login → token → Bearer header → dep decodes → user injected
payload is public
› payload is base64-readable — never put PII in JWT
› short-lived access + long-lived refresh — UX vs security
30-day window
› without server-side revocation — valid for full expiry
auth bypass
› client provides patient_id param → any user reads any data
› patient_id always from verified JWT — never from query param
› bad token + deleted user → both 401 — no info leak
SSE vs WebSockets
› server→client only · HTTP · auto-reconnect · HTTP/2
› bidirectional · chat · collaborative editing · gaming
nginx / ALB trap
› buffered responses — client sees nothing until close
› nginx buffering off + ALB idle timeout config
› media_type text/event-stream — async generator yields chunks
Production Gotchas
split-brain
› in-memory cache per worker — write in W1 invisible to W2
never in prod
› file watchers consume CPU — unexpected restarts under load
› structlog / JSON formatter — queryable by log aggregators
unqueryable
› unstructured text — can't query in Datadog/CloudWatch
› default pool_size=5 × 4 workers = 20 conns max
traffic spike
› pool at max under load = next connection blocks
Testing
› sync TestClient first — AsyncClient only when needed
› full request-to-response test with in-memory DB
› transaction wraps test, rollback after — clean state
› DI system handles the swap — test real code paths