Reading Source Code: How I Do It Without Going Insane

// essay · HiveCore Dev · 2026-05-09

Start with the entry point

Every language has a manifest that tells the runtime how to launch the program. In Node.js it’s package.json, in Python it’s pyproject.toml or setup.cfg, in Go it’s go.mod. Open that file first, locate the field that actually starts the process – main, scripts.start, entry_points.console_scripts, or the cmd directory in a Go module. Open the referenced file and read it line‑by‑line without chasing any imports.

The goal is to see the “big picture” skeleton: which framework is being used, how configuration is loaded, whether there is a dependency‑injection container, and where the HTTP server (or CLI parser) is instantiated. In practice you will spend about 20‑30 minutes on this step and emerge with a mental outline that most contributors never write down. That outline is the coordinate system you will use for every subsequent dive.

Take notes in a plain‑text file. Write down the name of the entry file, the framework version, and any top‑level middleware that is attached. This tiny artifact saves you from re‑discovering the same facts when you switch branches later.

Follow one feature, end to end

Pick a concrete, user‑visible behavior – “create a new project,” “reset a password,” or “fetch the account balance.” Start at the UI layer: locate the component, route, or handler that renders the button or endpoint. Trace the call chain forward: component → service → repository → database query. Do not try to understand unrelated modules; deliberately ignore everything else.

Document the path as you go. A simple markdown table with columns “Layer,” “File,” “Function,” and “Key data” is enough. When you hit a function that delegates to a third‑party library, note the library version and the contract it expects. When you encounter a conditional branch, ask yourself whether the branch is exercised by the feature you are following; if not, mark it as “out of scope for this trace.”

After you have a complete flow, run the feature in a debugger or add a temporary console.log/print statement at the entry and at the exit. Verify that the data you observed in the code matches the runtime values. This step catches “dead code” – functions that exist but are never called by the path you care about.

Repeat the process for two or three additional features that touch different subsystems (e.g., a read‑only API call, a background job, and a webhook). The resulting collection of traces forms a mental map of the entire architecture without ever needing to read every file.

Read tests when the code is unclear

Good test suites are executable documentation. When a function’s intent is opaque, locate its test file. In most mono‑repos the convention is src/module/__tests__/foo.test.ts or tests/unit/test_foo.py. Use a project‑wide grep to find references:

git grep -n 'function_name(' --include='*.test.*'

Look at the fixture setup, the input arguments, and the expected assertions. If the test uses a mock, follow the mock definition – it often reveals the external contract the function relies on. When the test is parameterised, read the data table; it usually enumerates the edge cases the author cared about.

If the repository lacks tests for a critical area, add a minimal test that captures the observed behavior. This does two things: it locks down the semantics you just inferred, and it gives future readers a concrete entry point.

git log -p on a single file

Historical context is a missing dimension in most code reviews. When a function looks like a kludge, run:

git log -p --follow path/to/file.ts | less

The diff view shows each change, who made it, and the accompanying commit message. A pattern often emerges: a short, well‑named function that grew into a 200‑line monster because a series of hotfixes were piled on without refactoring. Spot the commit that introduced the first “weird thing” and read the associated issue or PR description if it exists.

For pinpointing responsibility, combine git blame with a grep:

git blame src/payments/refund.ts | grep 'weird thing'

Then inspect that commit directly:

git show <commit-sha>

Understanding why a line was added is often more valuable than understanding what it does. It tells you the business pressure that produced the code, and it hints at the acceptable trade‑offs.

Commit messages and PR descriptions

A well‑run repository treats the commit log as a narrative. Before you stare at any source file, pull a list of recent PRs that touched the high‑level modules you care about:

gh pr list --search 'merged:>2024-01-01' --json title,url,body

Read the titles and bodies of the latest 20‑30 PRs. Good authors include:

Why the change was needed (business requirement, bug, performance regression).
What the public API contract is after the change.
Any migration steps required by downstream consumers.

If the PR description references a ticket in your issue tracker, open that ticket. The combination of ticket, PR, and commit diff is a complete design rationale. Skipping this step forces you to infer intent from code alone, which is a recipe for misinterpretation.

When to give up

Not every codebase is salvageable. When you encounter a repository that:

Has no type information (no .d.ts, no mypy stubs, no Flow/TS).
Lacks any test coverage for core modules.
Shows a commit history that ends five years ago with the original authors no longer reachable.

the cost of incremental understanding can exceed the benefit. In those cases, adopt a “read‑the‑public‑API‑and‑replace” strategy:

Write a thin wrapper that mirrors the exported functions or classes.
Implement the wrapper using a clean, typed language you control.
Gradually migrate callers to the wrapper, deprecating the old module.

This approach isolates the technical debt and gives you a testable contract for future work.

Set up a REPL or debugger early

Static reading is only half the battle. Spin up a REPL that loads the project’s runtime context – node -r ts-node/register for TypeScript, python -i -m myapp for Python, or go run ./cmd/app for Go. Inspect objects, call functions, and observe real values. When you hit a line you don’t understand, pause the debugger, dump the relevant locals, and compare them to the test expectations you gathered earlier.

In practice, I keep a one‑page cheat sheet per project:

Command to start the REPL.
How to import the entry module.
Typical environment variables required for a full stack (e.g., DATABASE_URL, REDIS_URL).

This cheat sheet reduces the friction of “spin up the app, then read the code” to a single, repeatable workflow.

Document as you go

Never assume someone else will write the documentation you need. As soon as you resolve a confusing construct, add a comment or a markdown note in a docs/architecture folder. Use a consistent format:

## Module: payments/refund.ts
- Purpose: handle refund workflow for card payments.
- Key invariants:
  * Refund amount never exceeds original charge.
  * Idempotency key is stored in `refunds.idempotency_key`.
- Open questions:
  * Why do we call `externalGateway.refund` before persisting?

Store these notes in the same repository; they travel with the code and survive branch changes. Over time the collection becomes a low‑maintenance knowledge base that prevents future engineers from repeating the same “start‑from‑scratch” cycle.