Skip to application
Praxor Lab
Mech Interp / RL

We work on what makes large models tick.

An independent research lab studying the internals of large language models: how they reason, how they adapt, and how reinforcement learning rewires them. Papers in the open. Loop, in development. Looking for collaborators.

a.04
Lead Metric
0
Papers in Progress

drafted in the open, with code and data released alongside

a.01
0
Research Tracks
Interpretability, reasoning, adaptation, RL
a.02
0%
Open Source
Code, data, and notebooks released
a.03
0
Product in Development
Loop · RL training infra with interp built in
§01About the Lab

Independent research, in the open.

Praxor Lab studies the internals of large language models. How they reason, how they adapt, and how reinforcement learning rewires them. We write it up in the open.

Two papers in draft. Loop in development. The work runs on open code, careful interventions, and claims sized to evidence.

Fig. A — Areas of focus04 / 04
  1. 01
    Mechanistic Interpretability
    Causal interventions, activation analysis, and circuit-level methods. We figure out what models actually do, not what they appear to do.
  2. 02
    Reasoning & Chain-of-Thought
    Step-level analysis of how language models reason, and whether the trace they write is causally responsible for the answer they give.
  3. 03
    Parameter-Efficient Adaptation
    Low-rank methods, subspace adaptation, and principled rank allocation. Adapt large pretrained models without retraining from scratch.
  4. 04
    Open by Default
    Code, datasets, and analysis notebooks released alongside the paper. Honest reporting of negative results, not just headlines.
§02Research Areas

What we're working on.

Two papers in active development. One on the internals of reasoning models, one on parameter-efficient adaptation. The tracks below are the directions those papers grow into. Each is meant to produce a small, verifiable result, not a manifesto.

R-01

Mechanistic Interpretability

Probing the internal computations of large language models. Causal interventions, activation analysis, and circuit-level methods to understand which parts of a model are actually doing the work.

Causal InterventionActivation PatchingCircuit AnalysisProbing
R-02

Reasoning & Chain-of-Thought

How language models reason step by step, and whether the written trace is causally responsible for the answer. We compare prompted and trained reasoners and measure step-level effects.

CoT FaithfulnessRL-Trained ReasonersStep-Level CausalityTrace Analysis
R-03

Parameter-Efficient Adaptation

Adapting large pretrained models to downstream tasks without retraining from scratch. Low-rank decompositions, subspace methods, and rank allocation guided by the model's own spectrum.

Low-Rank AdaptationSubspace MethodsSpectral AnalysisRank Allocation
R-04

Open Empirical Practice

Reproducible experiments, released code, and claims sized to evidence. Negative results count. Ablations are load-bearing.

Open CodeReproducibilityAblation StudiesHonest Reporting
§03Praxor / Loop · Product
In development·Targeting Q3 2026

RL training infrastructure,
with the model interior on by default.

An opinionated stack for post-training and model specialization. Training APIs, managed datasets, and research-native environments, with circuit-level traces of what your reward signal is doing to the model on every checkpoint.

Built by interpretability researchers, for teams who care which circuits their reward is actually moving.

03.1The problem we're solving
01Broken SDKs

Thirty minutes SSH'd into a fresh GPU box before you find out the driver is wrong, the CUDA version is wrong, or the trainer hangs at step zero.

02Unmanaged data

Terabytes of trajectories on object storage with no dedup, no eval index, no version. Someone writes a new ETL every time you want to rerun an ablation.

03Black-box runs

Loss goes down, eval goes up, and you have no idea which circuits the reward moved. Reward hacking only shows up later, in deployment.

03.2The product · three pillars
01

Training API

PPO, GRPO, DPO, and RLHF behind one SDK. Reward shaping, KL control, and checkpoint-level interpretability turned on by default. Not a library you bolt on.

Surface
loop.train(recipe='grpo', base='llama-3.1-70b')
loop.reward(fn=my_scorer, kl_target=0.05)
loop.checkpoint(every=200, with_circuits=True)
02

Managed Datasets

Versioned trajectory storage at terabyte scale. Dedup, filter, mix, and slice without writing a new pipeline. Eval sets and rollouts share one index.

Surface
loop.dataset.attach('trajectories.parquet')
loop.dataset.dedup(by='prompt_hash')
loop.dataset.mix(weights={...})
03

Research Environments

Pre-warmed GPU pools with research-native images. Notebooks, training, and eval share one environment. Every node is preflighted before it reaches you.

Surface
loop env launch --gpus 8xH100
loop env attach --notebook
loop env preflight   # ✓ all nodes healthy
03.3What's different · interp on by default

Every checkpoint comes with a circuit diff.

Your reward shaped a circuit. Loop tells you which one, every run, as a default artifact. Interpretability is in the CI loop, not an opt-in plugin.

  • 01Activation drift maps relative to the base model, per checkpoint.
  • 02Reward-hacking probes on held-out adversarial sets, scored continuously.
  • 03Step-level causal attribution for chain-of-thought rollouts.
  • 04Circuit deltas: which attention heads and MLP features the reward actually moved.
03.4What a circuit diff actually looks like
circuit-diff·my-rl-run / ckpt-final
vs. base llama-3.1-70b
Heads moved
14
MLP features
2
Flagged for review
1
KL to base
0.04
Circuit deltastop 7 by magnitude
  • L23.H07
    +0.182reward-aligned
  • L24.H02
    +0.141reward-aligned
  • L25.H11
    +0.118reward-aligned
  • L26.H03
    +0.106reward-aligned
  • L28.H04
    +0.094reward-aligned
  • L31.MLP.f2104
    +0.224drift · review
  • L31.MLP.f0871
    +0.061neutral
Probes & evalsheld-out
  • Reward-hacking probe
    clean
    held-out · n=2k
  • Truthfulness (TruthQA)
    0.74 → 0.79
    Δ +0.05
  • Sycophancy
    0.18 → 0.21
    Δ +0.03 · watch
  • Refusal regression
    0.02 → 0.02
    no change

Generated automatically with every checkpoint. No instrumentation in your training loop. Available as JSON, in the dashboard, or via loop diff <ckpt>.

03.5How Loop compares

There are good open-source RL trainers. None of them ship a managed dataset layer, a preflighted GPU environment, and circuit-level interpretability in the same box. Loop is the opinionated stack — not a piece of one.

Capability
Praxor / Loop
TRL
OpenRLHF
Roll your own
PPO / GRPO / DPO / RLHF
Single SDK · shared recipes
Per-trainer classes
PPO + GRPO
Build it
Trajectory dataset management
Versioned · deduped · mixable
BYO
BYO
Build it
GPU env preflight
Pre-warmed · preflighted
SSH and pray
Circuit-level interp
Default · every checkpoint
Bolt-on
Reward-hacking probes
Continuous · held-out
Bolt-on
Support
Direct line to engineers
GitHub issues
GitHub issues
You

Reflects publicly available capabilities as of Q2 2026. We're happy to be wrong — tell us at loop@praxorlab.com.

03.6In practice
~/loop · my-rl-runv0.4.1-alpha
$ loop init my-rl-run --base llama-3.1-70b→ env: 8×H100 ready (preflight passed in 11s)→ base: llama-3.1-70b · spectrum cached$ loop dataset attach trajectories.parquet→ ingested 14.2M rollouts · deduped to 9.8M$ loop train --recipe grpo --steps 4000→ step 0400  loss 1.84 → 1.62  kl 0.04→ step 2000  eval pass@1: 0.412 → 0.491→ step 4000  eval pass@1: 0.491 → 0.508✓ circuits moved: 14 attention heads in L23–L28✓ reward-hacking probe: clean (held-out, n=2k)! drift on L31.MLP: flagged for review→ checkpoint: loop://runs/my-rl-run/ckpt-final
03.7Access

In development. Looking for design partners.

For teams running serious post-training. Tell us what you're training and where the tooling is breaking, and we'll keep you in the loop as the alpha opens up.

Stage
Pre-alpha · design-partner phase
Compute
Bring your own
Contact
Direct line to the team
§04Products & Services

Applied work, from the same lab.

Most of what we do is research. For teams that need the same methods applied to a specific model or product, we take on a small number of engagements per quarter, built around the interpretability and adaptation work the lab is already doing.

04.1Engagements · what we take on
PX-016–10 wks

Custom Adapter Training

Parameter-efficient fine-tuning for your domain, sized to your compute budget.

  • Low-rank adapters with principled rank allocation
  • Spectral profiling of your base model
  • Released as drop-in checkpoints with eval reports
PX-024–8 wks

Reasoning Model Audit

Mechanistic analysis of how a reasoning model is actually arriving at its answers.

  • Step-level causal interventions on chain-of-thought
  • Faithfulness measurements across your eval set
  • Failure-mode breakdown with reproducible notebooks
PX-034–8 wks

Evaluation & Stress-Test

Domain-specific evals and adversarial review, not vanity benchmarks.

  • Task-grounded eval suites with regression tracking
  • Probing for shortcut learning and confabulation
  • Reports that match the evidence behind them
PX-043–6 mo

Research Collaboration

Joint work on a focused interpretability or adaptation question, ending in a co-authored draft.

  • Scoped to a single empirical question
  • Open code, shared intermediate results
  • Aimed at a workshop or conference submission
04.2How an engagement runs
00Week 0

Discovery

1-hour call plus async intake. We learn your model, your goal, and what good looks like.

01Week 1

Calibration

Agree the question, the eval, and the fail conditions. Scope is locked before we touch compute.

02Weeks 2 → N

Execution

Work in the open. Weekly working session and intermediate results in a shared notebook.

03Final week

Handoff

Written report, a reproducible repo, and a working session with your team.

04.3What you get
deliverable·PX-01 · your-org · 2026-Q3
handoff manifest
Filesin your repo
  • └─engagement/
  • ├─report.pdf
  • ├─notebooks/
  • │ ├─01_baseline.ipynb
  • │ ├─02_method.ipynb
  • │ └─03_eval.ipynb
  • ├─checkpoints/
  • ├─evals/
  • └─handoff.md
Termsnon-negotiables
  • 01Lives in your infrastructure — not behind our login.
  • 02MIT, Apache, or your terms. No lock-in.
  • 03Reproducible end-to-end from a clean checkout.
  • 04Direct line to the team for 30 days post-handoff.
04.4Fit · who this is for
If this is you
  • You have a model in production (or close to it) with a specific failure mode.
  • You can share your evals, or you'd rather build them with us.
  • You want to understand the change, not just deploy a fix.
Not the right fit if
  • You're shopping vendors, not looking for a partner.
  • You want a black box delivered with no insight into how or why.
  • The work is generic LLM app development — there are good firms for that.
04.5Start

Open to a small number of engagements.

Send a short note. We'll reply with whether the shape fits, and if it does, a 30-minute discovery call to scope it.

What to send
The model, where it sits, and the decision the work needs to support
Pricing
Fixed per engagement, scoped to the question
Compute
Bring your own, or use ours
IP
Yours. We retain the right to publish methods, never your data
§05Join the Lab

Three ways to engage with the lab.

Two papers in active draft, Loop in development, and room for a small group of collaborators. Pick the level of involvement that fits where you are.

Flagship
P-01

Research Fellowship

Work alongside Praxor researchers on one of the papers in active draft, or scope a related question of your own. Small group, close collaboration, public output.

Duration
6 Months
Cohort
01 · 2026
  • Direct collaboration with the research team
  • Co-authorship on the paper you contribute to
  • Weekly working sessions and informal review
  • Code, data, and analysis released openly
Rolling review · Decision ≤ 2 weeksApply for Fellowship
P-02

Visiting Collaborator

Flexible

For researchers with their own ongoing work who want to contribute to a specific draft on a lighter time commitment.

  • Co-authorship on a single paper or section
  • Remote-friendly, project-scoped
Get in touch
P-03

Reading Group

Ongoing

An open reading group on interpretability and adaptation. For anyone who wants to follow along and discuss the work.

  • Weekly paper discussions
  • Open invitation to comment on our drafts
  • Discord for ongoing conversation
Join the group
§06Process

How research runs at the lab.

The lab works in small, scoped projects rather than open-ended programs. Four phases over roughly six months.

Phase 0101

Apply

Tell us your background and what draws you to interpretability or adaptation. No prior publications required. We look for care about the question and willingness to do real empirical work.

Phase 0202

Scope

We sit down with accepted fellows and pick something to work on. Usually a contribution to a draft, sometimes a related question of your own.

Phase 0303

Research

Six months of focused work. Weekly working sessions and intermediate results in the open from day one.

Phase 0404

Publish

We draft together, share early for outside review, and submit to a workshop or conference. The artifact is the paper plus the code, data, and notebooks needed to reproduce it.

§07Application

Want to join the lab?

We're looking for fellows and collaborators. Students, engineers, and self-directed researchers are all welcome. What we need most is people who care about getting the empirical work right.

Deadline
Rolling
Decision
≤ 2 weeks
Start
Q3 2026
Duration
6 months
Form F-01Praxor Lab · Cohort 01

Reviewed on a rolling basis. Decision within two weeks.