What This Project Demonstrated

Bill Paseman's case is a proof of concept for patient-led precision medicine research — a model where the patient, not the institution, drives deep multi-omic investigation of their own condition. Over the course of this project (spanning a 2018 hackathon and 2026 AI analysis), the following was accomplished without institutional support:

0
Foundation: Know What You're Working With
Collect your raw data before anything else. Without raw files, analysis is impossible.
1

Obtain Your Raw Genomic Data

The single most important step. You have a legal right to request your genomic data files under HIPAA in the US and GDPR in Europe. Don't accept only a PDF report — request the underlying VCF, BAM/CRAM, or counts files.

Data TypeWhat to Ask ForWho Has It
Germline WGS / WESVCF files (.vcf or .vcf.gz), BAM/CRAM alignment filesYour genetics lab, Nebula Genomics, Dante Labs, Genome Medical, research biobanks
Tumor sequencingSomatic VCF, RNA-seq counts, CNV filesYour hospital's pathology or molecular tumor board
Clinical genomic panelPDF report + raw VCFFoundationOne CDx, Tempus xT, Caris MI Profile, Guardant
Microarray / SNP chipRaw CEL files or VCF23andMe, AncestryDNA (limited utility for rare disease)
Key Insight from Bill's Case
The 2018 hackathon used BGI whole blood sequencing (germline) and tumor RNA-seq + somatic variant calling from archived nephrectomy tissue. Archived surgical tissue is often stored in formalin-fixed paraffin-embedded (FFPE) blocks for 10+ years. You can request that these be re-sequenced — most patients don't know this option exists.
2

Assemble Your Complete Clinical Records

Collect all pathology reports, operative notes, radiology reports, lab results, and genetics consult notes. Most patients dramatically underestimate how much data they have scattered across institutions.

  • Use patient portal apps (MyChart, Epic) to download structured data
  • Submit written records requests to each institution you've been seen at
  • Organize chronologically — date of diagnosis is your anchor point
  • Flag discrepancies between institutions (imaging reports often disagree on measurements)
1
Build Your Patient Profile
Create a single structured document that captures your complete medical story.
3

Create a Structured Patient Profile Document

A structured document (plain text, JSON, or Word) that puts your entire medical story in one place. This serves two purposes: (1) gives AI systems the context they need to reason correctly about your case, and (2) becomes your permanent research record, independent of any EHR.

What to include:

  • Demographics and your role as a patient-researcher
  • Primary diagnoses with staging, dates, institutions, and current status
  • Comorbidities with ICD codes and current management
  • Treatment history with outcomes
  • Germline genetics findings (variants, zygosity, population frequency, clinical significance)
  • Somatic tumor findings (if available)
  • Labs and imaging summaries
  • Open clinical questions you want answered

The file bill_paseman_patient_profile2.json from this project is a working template you can adapt.

4

Add an LLM Orientation Prompt at the Very Top

This is the most important design lesson from this project. When LLMs receive large medical documents without a grounding frame, they hallucinate — misidentifying conditions, confusing comorbidities with primary diagnoses, and suggesting treatments already ruled out by the data.

Your orientation prompt should explicitly state:

  • Who you are and what your actual diagnoses are (with current status — NED, active, surveillance)
  • What the document IS and IS NOT
  • What has been definitively ruled out (syndromes, treatments)
  • How to navigate the document for different types of questions
Without This Step
In testing, LLMs given the patient profile without an orientation prompt consistently hallucinated additional cancers, described the patient as having active metastatic disease, and suggested treatments (like belzutifan) that the genomic data explicitly contraindicated. The orientation prompt reduced these errors dramatically.
5

Annotate Every Finding with Clinical Significance

Don't just list data — annotate each finding with what it means. For every variant, lab result, or genomic finding, include:

  • A clinical_significance field in plain English (HIGH / MEDIUM / LOW)
  • Evidence quality label (confirmed, plausible, ruled_out, unknown)
  • Cross-references to related findings in other sections
  • What it means for treatment decisions

This transforms raw data into a reasoning document that any clinician or AI can act on.

2
Germline Analysis
Determine whether your condition is hereditary and identify constitutional risk factors.
6

Identify Disease-Relevant Germline Genes

For your specific condition, research which hereditary syndromes are associated with it and which genes to check. Resources:

  • ClinVar (clinvar.ncbi.nlm.nih.gov) — searchable database of variants and clinical significance
  • OMIM (omim.org) — genetic disease reference encyclopedia
  • NCCN Guidelines — gene panels recommended for specific cancer types (free registration)
  • Your condition's patient advocacy organization — many maintain curated gene lists
Example: Bill's Gene List
p1RCC → check MET, FH, BAP1; Meningioma → check NF2, SMARCE1, SMARCB1, LZTR1, SUFU, PTCH1; Both together + family history concern → check Lynch syndrome MMR genes (MLH1, MSH2, MSH6, PMS2).
7

Query Your Germline VCF for Target Genes

If you have a VCF file, you or a bioinformatician can filter it for your target genes. In this project we used command-line tools (zcat + grep) and Python scripts on annotated CSV files from BGI.

For each variant found, assess:

  • Population frequency — variants >1% frequency are almost never clinically significant
  • SIFT and PolyPhen2 scores — functional impact predictors
  • ClinVar status — is it classified as Pathogenic, Likely Pathogenic, VUS, or Benign?
  • Variant type — truncating (frameshift, stop-gain, splice) variants are higher concern than missense
8

Interpret Germline Findings with Care

The single most important judgment call in germline analysis is distinguishing a pathogenic variant from a benign polymorphism. Common polymorphisms (>1–5% population frequency) with benign functional predictions are almost never clinically significant, even in disease-associated genes.

When in Doubt
Request a genetic counselor review of any variant you're uncertain about before drawing clinical conclusions. A variant in BRCA1 is not the same as a pathogenic BRCA1 variant. This distinction matters enormously for family cascade testing decisions.
3
Somatic Tumor Analysis
Requires tumor sequencing data — RNA-seq, WES/WGS, or a clinical genomic panel.
9

Identify Differentially Expressed Genes (RNA-seq)

If you have RNA-seq from tumor vs. matched normal tissue, calculate fold-changes for each gene. Prioritize:

  • Genes strongly UP in tumor — potential drivers or therapeutic targets
  • Genes strongly DOWN in tumor — tumor suppressors lost
  • Genes completely absent in normal but present in tumor — potential fusions or aberrant activation

Tools: DESeq2, edgeR (bioinformatics software). Or ask an LLM to analyze a processed counts table if you can provide one.

10

Cross-Reference RNA Findings with DNA

The most powerful signals are convergent — the same gene showing abnormality at multiple levels simultaneously. This is how you separate noise from signal:

PatternInterpretationConfidence
RNA overexpressed + DNA amplified + somatic mutationsTriple-positive driver — highest actionabilityHIGHEST
RNA silenced + DNA deletedConfirmed loss-of-functionHIGH
RNA silenced + DNA amplifiedParadox — investigate epigenetic silencing (promoter methylation)REQUIRES INVESTIGATION
RNA signal only, DNA normalPossible fusion or expression dysregulation — confirm with additional testingMEDIUM
11

Perform Somatic Copy Number Variant (CNV) Analysis

Compare tumor copy number against a germline baseline (ideally matched blood/normal tissue). Always confirm CNV is somatic — absent from germline — before calling it a tumor finding.

  • Copy ratio >1.5x = amplification (potential driver or target)
  • Copy ratio <0.8x = deletion (potential tumor suppressor loss)
  • Copy ratio ~1.0 = normal diploid
12

Run Mutational Signature Analysis (SBS)

Mutational signatures reveal the mechanism that created the tumor's mutations — aging, APOBEC activity, MMR deficiency, tobacco exposure, UV, etc. This has direct clinical implications:

SignatureMechanismClinical Implication
SBS1 + SBS5Normal aging / clock-likeExpected in all cancers; level should match patient age
SBS2 + SBS13 (APOBEC)Cytidine deaminase activityOften associated with CDKN2A loss; cellular stress response
SBS6/15/21/26MMR deficiencyLynch syndrome candidate; checkpoint inhibitors (pembrolizumab) may work well
SBS4Tobacco / carcinogen exposureCommon in lung cancer
SBS7UV radiationMelanoma signature

Tools: SigProfilerExtractor (Python), COSMIC Mutational Signatures database (cancer.sanger.ac.uk/signatures/).

13

Calculate Microsatellite Instability (MSI)

Count somatic indels per megabase from your somatic indel VCF. This determines whether checkpoint inhibitor monotherapy is appropriate:

CategoryIndels/MbClinical Implication
MSS (Microsatellite Stable)<2Checkpoint monotherapy unlikely to work; combination approaches needed
MSI-L (Low)2–10Intermediate; clinical significance varies by tumor type
MSI-H (High)>10Lynch syndrome candidate; pembrolizumab/nivolumab often highly effective
4
AI-Assisted Interpretation
Use multiple AI systems as a virtual tumor board. Diversity of AI opinion catches errors.
14

Use Multiple AI Systems — The Surowiecki Principle

No single AI model has the full picture. Send your patient profile to multiple LLMs and compare their analyses. Diversity of AI opinion catches errors and reveals where the evidence is genuinely uncertain vs. where there is consensus.

The Wisdom of Crowds (Surowiecki) requires four conditions: diversity of sources, independence of models, decentralization (no single authority), and aggregation of outputs. Apply all four to your AI tumor board.

AI systems to consider: Claude, GPT-4o, Gemini, Perplexity (for real-time literature search).

Without the Orientation Prompt
The LLM orientation prompt in your patient profile (Step 4) is what separates useful AI responses from hallucinated ones. Without it, most LLMs will misidentify your conditions, confuse comorbidities with primary diagnoses, or suggest treatments your own data has already ruled out.
15

Conduct Structured AI Tumor Board Sessions

Frame your questions specifically rather than asking "what do I have?" or "what should I do?":

  • "Given the somatic findings in the genetics section, what would be the first-line treatment at recurrence?"
  • "Are there clinical trials for [condition] that match my molecular profile?"
  • "What does the co-occurrence of [condition A] and [condition B] suggest about underlying mechanisms?"
  • "What are the limitations of this analysis and what data would change the conclusions?"

Always ask the AI to label each claim with an evidence grade and include citations (PMID or DOI format).

16

Challenge the AI's Conclusions

Ask: "What alternative interpretations exist?" and "What would need to be true for your conclusion to be wrong?"

Example from This Project
The MET paradox (amplified DNA, silenced RNA) was only discovered by pushing past the standard p1RCC assumption that MET is the primary driver. The AI initially accepted the textbook model until challenged with the actual RNA expression data, which showed 307-fold silencing. Questioning default assumptions is often where the most actionable insights are found.
5
Closing the Loop with the Medical System
Research-grade data must be validated before driving clinical decisions.
17

Get CLIA-Certified Validation of Key Findings

Research-grade data (hackathons, direct-to-consumer sequencing) cannot directly drive clinical decisions. To get your findings into the medical record and usable by treating physicians, the most practical path is to order a CLIA-certified clinical genomic panel on archived tumor tissue or circulating tumor DNA.

PanelTypeTypical CostNotes
FoundationOne CDxTissue or liquid biopsy~$5,800 (often covered by insurance)Comprehensive solid tumor panel; FDA-approved companion diagnostic
Tempus xTTissue + RNA~$3,000–$5,000Includes RNA fusion analysis; strong in rare cancers
Caris MI ProfileTissue~$3,000–$5,000IHC + sequencing + expression; broad coverage
Guardant360Liquid biopsy (blood)~$2,000–$3,000No tissue needed; may miss some findings present only in archived tissue
Critical Gap Identified in Bill's Case
All genomic analysis lived in a patient-maintained shadow record, invisible to any treating oncologist. A recurrence scenario without CLIA validation means a new oncologist would make treatment decisions knowing only basic pathology — not knowing PDGFRA is the top driver, MET is not, or belzutifan is contraindicated. This is the most dangerous gap in patient-led research.
18

Share Your Patient Profile with Your Care Team

Bring a printed or PDF version of your patient profile to appointments. Frame it as: "I've organized my medical history here — it may help you understand my full picture."

Physicians who receive organized, cited patient research respond far better than to unstructured verbal summaries. The HTML report format (bill_paseman_genomic_report.html) generated in this project is designed to be shareable with care teams and readable without technical expertise.

19

Connect with Patient Advocacy and Research Communities

  • Your condition's patient advocacy organization (often has research grants, biobank programs, and clinical trial connections)
  • Research hackathons — the model used in Bill's case: patient-organized computational analysis of personal genomic data with bioinformatics volunteers
  • Academic medical centers with rare disease or precision oncology programs
  • ClinicalTrials.gov — search your condition + molecular target (e.g., "papillary RCC PDGFRA")
  • MatchMiner, TrialSpark, Massive Bio — AI-assisted clinical trial matching services

Tools & Resources Reference

Tool / ResourcePurposeCostSkill Level
BGI / Nebula / Dante LabsWhole genome or exome sequencing$300–$2,000None (send sample, receive files)
Strelka2Somatic variant calling (tumor vs. normal)Free (open source)Bioinformatics
SnpEff / VEPVariant functional annotationFree (open source)Bioinformatics
DESeq2 / edgeRRNA-seq differential expressionFree (R packages)Bioinformatics / R
SigProfilerExtractorCOSMIC mutational signature extractionFree (Python)Python basics
Ensembl REST APIVariant annotation, trinucleotide contextFreePython/API basics
ClinVar (clinvar.ncbi.nlm.nih.gov)Variant clinical significance databaseFreeNone
OMIM (omim.org)Genetic disease encyclopediaFreeNone
COSMIC (cancer.sanger.ac.uk)Cancer somatic mutations + signatures databaseFreeNone
ClinicalTrials.govClinical trial search by condition + targetFreeNone
Claude / GPT-4o / GeminiAI tumor board, interpretation, report generation$20–$30/monthNone (conversational)
FoundationOne CDx / Tempus xTCLIA-certified clinical validation$3,000–$6,000 (often insured)None (physician orders)
If You Have No Bioinformatics Skills
You can still accomplish most of this framework by: (1) obtaining your raw data files, (2) building your patient profile document in plain text or Word, (3) uploading relevant files to an AI system and asking it to perform the analysis. Claude Code and similar AI tools can run Python analysis scripts on your data files directly. For complex analysis, contact university bioinformatics departments — many offer pro-bono or low-cost assistance for rare disease patients through programs like Rare Genomics Institute or local academic hackathon programs.

The Core Principle

"The patient, not the institution, is the only entity with continuous access to the full picture — across all providers, all time points, and all data types. Institutions see fragments. The patient sees the whole."
— AdvocateOS Framework, Bill Paseman

The most important insight from this project is not technical — it is organizational. The role of tools like the patient profile JSON, LLM tumor boards, and this analytical framework is to give the patient's comprehensive view the structure it needs to be medically actionable.

A patient who has organized their own genomic data, built a structured profile, run multi-vendor AI analysis, and obtained CLIA validation arrives at a clinical encounter not as a passive recipient of care — but as an informed collaborator with data their oncologist may not have access to anywhere else.

What This Framework Enables