Bill Paseman's case is a proof of concept for patient-led precision medicine research — a model where the patient, not the institution, drives deep multi-omic investigation of their own condition. Over the course of this project (spanning a 2018 hackathon and 2026 AI analysis), the following was accomplished without institutional support:
The single most important step. You have a legal right to request your genomic data files under HIPAA in the US and GDPR in Europe. Don't accept only a PDF report — request the underlying VCF, BAM/CRAM, or counts files.
| Data Type | What to Ask For | Who Has It |
|---|---|---|
| Germline WGS / WES | VCF files (.vcf or .vcf.gz), BAM/CRAM alignment files | Your genetics lab, Nebula Genomics, Dante Labs, Genome Medical, research biobanks |
| Tumor sequencing | Somatic VCF, RNA-seq counts, CNV files | Your hospital's pathology or molecular tumor board |
| Clinical genomic panel | PDF report + raw VCF | FoundationOne CDx, Tempus xT, Caris MI Profile, Guardant |
| Microarray / SNP chip | Raw CEL files or VCF | 23andMe, AncestryDNA (limited utility for rare disease) |
Collect all pathology reports, operative notes, radiology reports, lab results, and genetics consult notes. Most patients dramatically underestimate how much data they have scattered across institutions.
A structured document (plain text, JSON, or Word) that puts your entire medical story in one place. This serves two purposes: (1) gives AI systems the context they need to reason correctly about your case, and (2) becomes your permanent research record, independent of any EHR.
What to include:
The file bill_paseman_patient_profile2.json from this project is a working template you can adapt.
This is the most important design lesson from this project. When LLMs receive large medical documents without a grounding frame, they hallucinate — misidentifying conditions, confusing comorbidities with primary diagnoses, and suggesting treatments already ruled out by the data.
Your orientation prompt should explicitly state:
Don't just list data — annotate each finding with what it means. For every variant, lab result, or genomic finding, include:
clinical_significance field in plain English (HIGH / MEDIUM / LOW)This transforms raw data into a reasoning document that any clinician or AI can act on.
For your specific condition, research which hereditary syndromes are associated with it and which genes to check. Resources:
If you have a VCF file, you or a bioinformatician can filter it for your target genes. In this project we used command-line tools (zcat + grep) and Python scripts on annotated CSV files from BGI.
For each variant found, assess:
The single most important judgment call in germline analysis is distinguishing a pathogenic variant from a benign polymorphism. Common polymorphisms (>1–5% population frequency) with benign functional predictions are almost never clinically significant, even in disease-associated genes.
If you have RNA-seq from tumor vs. matched normal tissue, calculate fold-changes for each gene. Prioritize:
Tools: DESeq2, edgeR (bioinformatics software). Or ask an LLM to analyze a processed counts table if you can provide one.
The most powerful signals are convergent — the same gene showing abnormality at multiple levels simultaneously. This is how you separate noise from signal:
| Pattern | Interpretation | Confidence |
|---|---|---|
| RNA overexpressed + DNA amplified + somatic mutations | Triple-positive driver — highest actionability | HIGHEST |
| RNA silenced + DNA deleted | Confirmed loss-of-function | HIGH |
| RNA silenced + DNA amplified | Paradox — investigate epigenetic silencing (promoter methylation) | REQUIRES INVESTIGATION |
| RNA signal only, DNA normal | Possible fusion or expression dysregulation — confirm with additional testing | MEDIUM |
Compare tumor copy number against a germline baseline (ideally matched blood/normal tissue). Always confirm CNV is somatic — absent from germline — before calling it a tumor finding.
Mutational signatures reveal the mechanism that created the tumor's mutations — aging, APOBEC activity, MMR deficiency, tobacco exposure, UV, etc. This has direct clinical implications:
| Signature | Mechanism | Clinical Implication |
|---|---|---|
| SBS1 + SBS5 | Normal aging / clock-like | Expected in all cancers; level should match patient age |
| SBS2 + SBS13 (APOBEC) | Cytidine deaminase activity | Often associated with CDKN2A loss; cellular stress response |
| SBS6/15/21/26 | MMR deficiency | Lynch syndrome candidate; checkpoint inhibitors (pembrolizumab) may work well |
| SBS4 | Tobacco / carcinogen exposure | Common in lung cancer |
| SBS7 | UV radiation | Melanoma signature |
Tools: SigProfilerExtractor (Python), COSMIC Mutational Signatures database (cancer.sanger.ac.uk/signatures/).
Count somatic indels per megabase from your somatic indel VCF. This determines whether checkpoint inhibitor monotherapy is appropriate:
| Category | Indels/Mb | Clinical Implication |
|---|---|---|
| MSS (Microsatellite Stable) | <2 | Checkpoint monotherapy unlikely to work; combination approaches needed |
| MSI-L (Low) | 2–10 | Intermediate; clinical significance varies by tumor type |
| MSI-H (High) | >10 | Lynch syndrome candidate; pembrolizumab/nivolumab often highly effective |
No single AI model has the full picture. Send your patient profile to multiple LLMs and compare their analyses. Diversity of AI opinion catches errors and reveals where the evidence is genuinely uncertain vs. where there is consensus.
The Wisdom of Crowds (Surowiecki) requires four conditions: diversity of sources, independence of models, decentralization (no single authority), and aggregation of outputs. Apply all four to your AI tumor board.
AI systems to consider: Claude, GPT-4o, Gemini, Perplexity (for real-time literature search).
Frame your questions specifically rather than asking "what do I have?" or "what should I do?":
Always ask the AI to label each claim with an evidence grade and include citations (PMID or DOI format).
Ask: "What alternative interpretations exist?" and "What would need to be true for your conclusion to be wrong?"
Research-grade data (hackathons, direct-to-consumer sequencing) cannot directly drive clinical decisions. To get your findings into the medical record and usable by treating physicians, the most practical path is to order a CLIA-certified clinical genomic panel on archived tumor tissue or circulating tumor DNA.
| Panel | Type | Typical Cost | Notes |
|---|---|---|---|
| FoundationOne CDx | Tissue or liquid biopsy | ~$5,800 (often covered by insurance) | Comprehensive solid tumor panel; FDA-approved companion diagnostic |
| Tempus xT | Tissue + RNA | ~$3,000–$5,000 | Includes RNA fusion analysis; strong in rare cancers |
| Caris MI Profile | Tissue | ~$3,000–$5,000 | IHC + sequencing + expression; broad coverage |
| Guardant360 | Liquid biopsy (blood) | ~$2,000–$3,000 | No tissue needed; may miss some findings present only in archived tissue |
Bring a printed or PDF version of your patient profile to appointments. Frame it as: "I've organized my medical history here — it may help you understand my full picture."
Physicians who receive organized, cited patient research respond far better than to unstructured verbal summaries. The HTML report format (bill_paseman_genomic_report.html) generated in this project is designed to be shareable with care teams and readable without technical expertise.
| Tool / Resource | Purpose | Cost | Skill Level |
|---|---|---|---|
| BGI / Nebula / Dante Labs | Whole genome or exome sequencing | $300–$2,000 | None (send sample, receive files) |
| Strelka2 | Somatic variant calling (tumor vs. normal) | Free (open source) | Bioinformatics |
| SnpEff / VEP | Variant functional annotation | Free (open source) | Bioinformatics |
| DESeq2 / edgeR | RNA-seq differential expression | Free (R packages) | Bioinformatics / R |
| SigProfilerExtractor | COSMIC mutational signature extraction | Free (Python) | Python basics |
| Ensembl REST API | Variant annotation, trinucleotide context | Free | Python/API basics |
| ClinVar (clinvar.ncbi.nlm.nih.gov) | Variant clinical significance database | Free | None |
| OMIM (omim.org) | Genetic disease encyclopedia | Free | None |
| COSMIC (cancer.sanger.ac.uk) | Cancer somatic mutations + signatures database | Free | None |
| ClinicalTrials.gov | Clinical trial search by condition + target | Free | None |
| Claude / GPT-4o / Gemini | AI tumor board, interpretation, report generation | $20–$30/month | None (conversational) |
| FoundationOne CDx / Tempus xT | CLIA-certified clinical validation | $3,000–$6,000 (often insured) | None (physician orders) |
The most important insight from this project is not technical — it is organizational. The role of tools like the patient profile JSON, LLM tumor boards, and this analytical framework is to give the patient's comprehensive view the structure it needs to be medically actionable.
A patient who has organized their own genomic data, built a structured profile, run multi-vendor AI analysis, and obtained CLIA validation arrives at a clinical encounter not as a passive recipient of care — but as an informed collaborator with data their oncologist may not have access to anywhere else.