# Volt API - Agent Instructions

Volt predicts 3D structures for biological molecule systems. A run contains one or more jobs. A job contains one or more chains. Supported chain types are `protein`, `dna`, `rna`, `ligand`, and `ion`.

Base path: `/api/v1`

## Agent Rules

- Use only public external IDs returned by the API such as `run_id`, `job_id`, `key.id`, `group.id`, and `user.id`. Never ask for or surface internal database IDs.
- If you need to verify biological molecule identity or sequence data, use reputable primary sources only:
  - UniProt for protein identity, accession, organism, and sequence.
  - ChEMBL and PubChem for ligand identity, synonyms, CID, and canonical SMILES.
- Do not invent sequences, SMILES, accessions, or molecule mappings. If the identity is ambiguous or conflicting, say so and ask the user before submitting a run.
- When provenance is known, include it in chain metadata with `source` and `source_id` such as `UniProt` + accession or `PubChem` + CID.
- Treat Volt outputs as structure-prediction evidence, not biological proof.

### Pre-submission checklist — ask before building the payload

Before constructing or submitting any run, gather the following information. Raise all open questions in a single organised message rather than one at a time.

**Species**
- If a protein is identified by name or gene symbol (not a UniProt accession), ask which species/organism it comes from (e.g. human, mouse, yeast, E. coli) before looking up or assuming a sequence.
- If the user supplies a UniProt accession, confirm the organism recorded in UniProt for that entry and flag it if it differs from the user's stated intent.

**Sequence and length verification**
- After the sequence for each protein chain is known — whether supplied directly, resolved from UniProt, or derived from a construct — confirm the residue count with the user.
- If multiple protein chains are expected to be the same length (e.g. a homo-dimer or identical repeats), verify the lengths actually match and flag any discrepancy before submitting.
- If the user states an expected length (e.g. "it's about 350 aa"), check the resolved sequence against that expectation and highlight mismatches.
- Report the total structure-token count for the run so the user can anticipate cost before committing.

**Post-translational modifications (PTMs)**
- Ask whether any protein chain should carry PTMs (e.g. phosphorylation, glycosylation, acetylation, methylation, lipidation, cross-links).
- Remind the user that modifications are supplied as `modifications: {"<1-based position>": "<CCD code>"}` on the chain. Supported protein CCD codes are listed in the *Submit Payload* section.
- If a UniProt accession is used, note any annotated modification sites from UniProt and ask whether any should be included.

**Isoform and processed form**
- Ask whether the full canonical sequence should be used or a processed form (e.g. signal peptide removed, propeptide cleaved, specific isoform).

**Multimer stoichiometry**
- If a complex is described, confirm the number of copies of each chain (e.g. A₂B₁ heterotrimer) so chains can be duplicated correctly in the payload.

**Additional molecule types**
- Ask whether any small-molecule ligands (SMILES) or metal ions should be co-submitted in the same job.
- If nucleic acids are involved, confirm strand orientation and whether a complementary strand should be added as a separate chain.

## Authentication

- Send the API key in `X-API-Key: <token>` or `Authorization: Bearer <token>`.
- Tokens must be 128 alphanumeric characters.
- Call `GET /api/v1/` first to validate the key and inspect permissions.
- Permissions are `read` and `write`. `write` keys can use read endpoints too.
- Keys are scoped to a single group. List, status, cancel, and output operations are restricted to that group.
- Rate limits are enforced per key. If you receive `429`, obey `Retry-After` and `X-RateLimit-*` headers instead of hardcoding a fixed poll cadence.

## Response Format

- Most application responses use:

```json
{"success": true, "message": "some_message", "data": {}}
```

- Authentication and rate-limit failures may instead return FastAPI error bodies such as:

```json
{"detail": "invalid_api_key"}
```

- Public IDs are 16-character alphanumeric strings.

## Recommended Flow

1. `GET /api/v1/`
2. `POST /api/v1/run/submit` (or `POST /api/v1/run/screen` for pairwise screens)
3. Poll `POST /api/v1/run/status`
4. Use `POST /api/v1/run/list/jobs` or `POST /api/v1/job/info`
5. After a job is `complete`, either call `GET /api/v1/job/{job_id}/download` for a ZIP archive or `POST /api/v1/job/output/list`
6. Fetch specific artifacts with `POST /api/v1/job/output/get` when you do not want the full archive

`POST /api/v1/run/cancel` response includes `cancelled_job_count`, `active_job_count`, `refunded_token_count`, and `refunded_api_job_count`.

## Endpoints

| Method | Path | Permission | Purpose |
| --- | --- | --- | --- |
| `GET` | `/api/v1/agent` | none | This markdown guide. |
| `GET` | `/api/v1/` | valid key | Validate key and inspect key, user, and group metadata. |
| `POST` | `/api/v1/run/submit` | `write` | Submit a run with explicit jobs. |
| `POST` | `/api/v1/run/screen` | `write` | Submit a pairwise screen (list A × list B). |
| `POST` | `/api/v1/run/list` | `read` or `write` | List runs for the key's group. |
| `POST` | `/api/v1/run/status` | `read` or `write` | Get run status summary. |
| `POST` | `/api/v1/run/info` | `read` or `write` | Get run metadata and counts. |
| `POST` | `/api/v1/run/list/jobs` | `read` or `write` | List jobs for a run. |
| `POST` | `/api/v1/run/cancel` | `write` | Cancel a run in the same group. |
| `POST` | `/api/v1/job/status` | `read` or `write` | Get job status summary. |
| `POST` | `/api/v1/job/info` | `read` or `write` | Get job metadata and original input payload. |
| `GET` | `/api/v1/job/{job_id}/download` | `read` or `write` | Download a ZIP archive of completed job outputs. |
| `POST` | `/api/v1/job/output/list` | `read` or `write` | List downloadable artifacts for a completed job. |
| `POST` | `/api/v1/job/output/get` | `read` or `write` | Download one artifact by output ID. |

## Submit Payload

Use `jobs[].chains[]`.

Protein chains can be specified by UniProt accession alone — the server fetches the canonical sequence and fills in `type`, `value`, `name`, `source`, and `source_id` automatically:

```json
{
  "name": "TP53 + MDM2 screen",
  "description": "Example using UniProt IDs",
  "jobs": [
    {
      "name": "TP53 + MDM2",
      "chains": [
        {"uniprot_id": "P04637"},
        {"uniprot_id": "Q00987"}
      ]
    }
  ]
}
```

Or supply the sequence directly with explicit provenance:

```json
{
  "name": "TP53 complex screen",
  "description": "Example API submission",
  "notify_by_email": false,
  "jobs": [
    {
      "name": "TP53 + ligand",
      "chains": [
        {
          "name": "TP53",
          "type": "protein",
          "value": "MEEPQSDPSVEPPLSQETFSDLWKLLPENNVLSPLPSQAMDDLMLSPDDIEQWFTEDPGP",
          "source": "UniProt",
          "source_id": "P04637"
        },
        {
          "name": "example ligand",
          "type": "ligand",
          "value": "CC1=CC=C(C=C1)C(=O)O",
          "source": "PubChem",
          "source_id": "2244"
        }
      ]
    }
  ]
}
```

Top-level fields:

- `name`: required string, 4-128 chars.
- `description`: required string, 8-256 chars.
- `notify_by_email`: optional boolean, default `false`.
- `jobs`: required array, 1-50,000 items.

Job fields:

- `name`: required string, 1-128 chars.
- `chains`: required non-empty array.

Chain fields:

- `type`: required unless `uniprot_id` is set, one of `protein`, `dna`, `rna`, `ligand`, `ion`.
- `value`: required sequence, SMILES, or ion CCD code depending on type. Omit when using `uniprot_id`.
- `name`: optional, max 128 chars. When using `uniprot_id` and this is omitted, the primary gene symbol from UniProt is used.
- `source`: optional provenance label, max 256 chars. Auto-set to `UniProt` when using `uniprot_id`.
- `source_id`: optional provenance ID, max 256 chars. Auto-set to the accession when using `uniprot_id`.
- `modifications`: optional object for `protein`, `dna`, or `rna` only. Keys are 1-based positions as strings and values are CCD modification codes. Can be combined with `uniprot_id`.
- `uniprot_id`: optional UniProt accession (e.g. `P04637`). When set, the server resolves `type`, `value`, `name`, `source`, and `source_id` from UniProt. Cannot be combined with `value` or a non-protein `type`.

Type-specific validation:

- `protein`: sequence must use `ACDEFGHIKLMNPQRSTVWY`, length >= 2.
- `dna`: sequence must use `ACGT`, length >= 2.
- `rna`: sequence must use `ACGU`, length >= 2.
- `ligand`: value must be a valid SMILES with at least one heavy atom.
- `ion`: value must be one of `CA`, `CL`, `CO`, `CU`, `FE`, `K`, `MG`, `MN`, `NA`, `ZN`.

Supported modification codes:

- Protein: `ACY ALY CME CSD CSO HYP KCX M2L M3L MMA MLY MSE PCA PTR PYL SEC SEP SME TPO`
- DNA: `1MA 1MG 2MG 5CA 5FC 5HC 5MC 5MU 6MA 7MG DC5 DI DOC DU H5U I6A M2G OMG`
- RNA: `1MA 1MG 2MG 4SU 5CA 5FC 5MU 7MG H2U H5U I6A INO M2G OMC OMG PSU Y`

`ligand` and `ion` chains do not support `modifications`.

## Screen Payload

`POST /api/v1/run/screen` is a shorthand for pairwise interaction screens. Instead of enumerating every job manually, supply two lists of chain specs and the server generates one job per A×B pair.

```json
{
  "name": "MDM2 inhibitor screen",
  "description": "Screen MDM2 against a compound library",
  "list_a": [
    {"uniprot_id": "Q00987"},
    {"uniprot_id": "P04637"}
  ],
  "list_b": [
    {"name": "aspirin", "type": "ligand", "value": "CC1=CC=C(C=C1)C(=O)O", "source": "PubChem", "source_id": "2244"},
    {"uniprot_id": "P12931"}
  ]
}
```

The above produces 4 jobs: `Q00987 vs aspirin`, `Q00987 vs P12931`, `P04637 vs aspirin`, `P04637 vs P12931`.

Top-level fields are the same as `/run/submit` (`name`, `description`, `notify_by_email`) except `jobs` is replaced by:

- `list_a`: required array of chain specs, 1–10,000 entries.
- `list_b`: required array of chain specs, 1–10,000 entries.

Each entry in `list_a` and `list_b` is a single chain spec using the same fields as chains in `/run/submit` (including `uniprot_id` shorthand). The total number of pairs (`len(list_a) × len(list_b)`) must not exceed 50,000.

Job names are auto-generated as `"{label_a} vs {label_b}"` where the label for each entry is taken from (in priority order): the chain's `name` field, its `uniprot_id`, or a positional fallback (`A1`, `B2`, etc.).

The endpoint returns immediately with `run_id`, `job_count`, `token_cost`, and `jobs_left`. The run starts in `new` status — poll `POST /api/v1/run/status` until it reaches a terminal state, the same as after `/run/submit`.

Error `invalid_screen_size_error` is returned when the product exceeds 50,000. Error `api_key_job_quota_exceeded_error` is returned when the key does not have enough remaining quota for the full screen.

## Size And Cost Limits

- Each job must contain more than 1 structure token and at most 3,500 structure tokens.
- Structure tokens are counted as:
  - protein, DNA, RNA: 1 token per residue/base
  - ligand: 1 token per heavy atom
  - ion: 1 token per ion
  - modifications: add extra tokens based on modification atom count
- Volt token cost per job:
  - `1-500` -> `1`
  - `501-1000` -> `2`
  - `1001-1300` -> `3`
  - `1301-1600` -> `4`
  - `1601-3000` -> `5`
  - `3001-3500` -> `6`

`POST /api/v1/run/submit` returns the new `run_id`, submitted `job_count`, total run `token_cost`, and remaining API-key `jobs_left`.

## Statuses

Run statuses: `draft`, `new`, `queued`, `preparing`, `running`, `complete`, `failed`, `cancelled`

Job statuses: `draft`, `queued`, `preparing`, `prepared`, `running`, `complete`, `failed`, `cancelled`

The `status` filter on list endpoints is validated against these exact values; an invalid value produces a 422 response.

Both `/run/submit` and `/run/screen` return immediately with a `run_id` once the run is accepted. The run starts in `new` status. Poll `POST /api/v1/run/status` until the run reaches a terminal state: `complete`, `failed`, or `cancelled`.

## Pagination

`POST /api/v1/run/list` and `POST /api/v1/run/list/jobs` support:

- `limit`
- `after`
- `sort` with `newest` or `oldest`
- `status`

Both endpoints return:

```json
{
  "pagination": {
    "limit": 50,
    "count": 50,
    "has_more": true,
    "next_after": "r0AbCdEfGhIjKlMn"
  }
}
```

## Output Retrieval

- Convenience archive: `GET /api/v1/job/{job_id}/download`
- Optional query param `type` accepts `full`, `structure`, or `data`. Omit it to get `full`.
- `type=full` returns CIFs, per-prediction `*.analysis.json` files, and `analysis_summary.json`.
- `type=structure` returns only CIFs.
- `type=data` returns per-prediction `*.analysis.json` files plus `analysis_summary.json`.
- Per-artifact flow remains available:
  - First call `POST /api/v1/job/output/list` with `{"job_id":"<JOB_ID>"}`.
  - Then call `POST /api/v1/job/output/get` with `{"job_id":"<JOB_ID>","id":"<OUTPUT_ID>"}`.
  - `job/output/get` returns raw file bytes, not the normal JSON wrapper.

Expected output IDs:

- `p...` with `kind: "prediction"` for structure CIF files
- `p..._analysis` with `kind: "analysis"` for per-prediction analysis JSON
- `analysis_summary` with `kind: "analysis_summary"` for the run-level summary JSON

Outputs are available only after the job is complete and all expected public artifacts are present.
