Five Papers Accepted at ACL, SIGMETRICS, MLSys, and JAMIA in One Week

In a single week, Hill Research had four papers accepted at top-tier computer science conferences and one paper published in a peer-reviewed medical informatics journal. The research spans LLM reasoning, AI agents, retrieval-augmented generation, long-term memory systems, and clinical knowledge graphs — all foundational to the TriClick platform.

Combined with three presentations at AAAI 2026 in January, this brings Hill Research’s 2026 publication count to eight papers at premier venues.

ACL 2026 — Two Papers Accepted

ACL (Association for Computational Linguistics) is the top venue in computational linguistics. The conference takes place in San Diego this July.

Paper 1: Contract-Checked Editing for Verifier-Guided LLM Reasoning

“From Trajectories to Graphs: Contract-Checked Editing for Verifier-Guided LLM Reasoning” — Dr. Jack Li and team.

When using search and recombination to improve LLM reasoning, most candidate solutions aren’t runnable — hidden dependencies, broken imports, scope violations. In practice, only 41% of recombined candidates could be verified.

The paper introduces contract-checked graph editing: instead of treating LLM outputs as flat text, it represents them as typed reasoning graphs and runs a deterministic structural gate before the expensive verifier runs. The gate checks acyclicity, namespace closure, schema validity, and terminal constraints.

Results:

Verifier-runnable recombination: 41.2% → 92.8%
Accuracy improvement: +6.1 on MATH, +9.1 on MATH Level 5
42% fewer verifier calls

Relevance to TriClick: TriClick runs multi-step reasoning pipelines where outputs must be correct and auditable. The same principle applies — filter structurally invalid candidates before they reach the expensive validation step.

Paper 2: HyperWorld — Hybrid World Models for Grounded Language Agents

“HyperWorld: Hybrid World Models for Grounded Language Agents” — Dr. Jack Li and team.

AI agents that interact with real environments tend to fail in compounding ways — taking actions based on bad assumptions and spiraling. Current approaches either scale poorly with long interactions (Transformers) or lose track of specific entities (state space models).

HyperWorld combines an SSM-based dynamics core for efficient long-horizon processing with entity-centric episodic memory that tracks objects, constraints, and tool states. It adds a critic-guided rollout planner that imagines future trajectories and scores them before the agent commits to action.

Results:

Outperformed GPT-4+ReAct by 11-14 points on ALFWorld, WebShop, and SciWorld
53% reduction in constraint violations

Relevance to TriClick: TriClick’s AI agents generate code, query datasets, and produce statistical outputs where every action has downstream consequences for compliance. An agent that simulates the impact of its decisions before executing them is one you can trust with regulatory-grade work.

SIGMETRICS 2026 — One Paper Accepted

ACM SIGMETRICS is the premier venue for performance measurement and modeling. The conference takes place in Ann Arbor this June.

EviDex: Provenance-Weighted Evidence-Path Indexing

“EviDex: Provenance-Weighted Evidence-Path Indexing for Fresh and Auditable Retrieval under Continuous Updates” — Dr. Jack Li.

RAG systems work on corpora that change constantly — FDA labels, NCCN guidelines, new contraindications. A 15-minute TTL refresh window means your AI might return yesterday’s answer to today’s question. In Hill Research’s clinical workload, 31.2% of time-sensitive queries depended on updates committed in the last 15 minutes.

EviDex replaces periodic refresh with log-structured online compaction over intent-partitioned evidence-path buckets. Every retrieved path carries its provenance so a regulator can audit exactly which source, at which version, supported which claim.

Results:

Evidence-set violation at 15 minutes: 1.3% (vs 2.4% baseline)
Cost: $0.68 per 1k queries — 42% cheaper than adaptive TTL
At 10M documents / 16 nodes: 1,856 queries/sec, p99 latency 2.14s
Clinical correctness: 0.884 on 800-question physician-rated safety test

Relevance to TriClick: EviDex is the foundation of how TriClick handles evolving clinical evidence. When a guideline changes at 9:47 AM, the agent retrieving it at 9:48 sees the new version, cites it, and leaves an audit trail.

MLSys 2026 — One Paper Accepted

MLSys is the top venue for machine learning systems research. The conference takes place in Bellevue this May.

Ontology-Guided Long-Term Memory for Conversational RAG

“Ontology-Guided Long-Term Memory for Conversational RAG” — Dr. Jack Li.

Most RAG systems work for single-turn questions but fall apart in long, multi-session conversations. Dense retrieval recall dropped from 0.61 on early turns to 0.28 after turn 60 in the paper’s benchmarks — not because earlier evidence was irrelevant, but because vector similarity couldn’t bridge the conceptual gap.

The solution: extract durable user facts into a lightweight ontology memory graph, enrich queries with conversational signals, and route between graph-first and dense-first retrieval with a budget-aware learnable router.

Results:

Recall@10: 0.70 (vs 0.58 for dense-only)
47% reduction in cross-modality disagreement
81% cost reduction compared to long-context methods

Relevance to TriClick: An AI working with a research team across dozens of study design sessions needs to remember that the sponsor mentioned progression-free survival as their endpoint in session 3 — even 40 sessions later. Ontology-guided memory makes this possible.

JAMIA Open — Journal Publication

JAMIA Open is the open-access journal of the American Medical Informatics Association, published by Oxford University Press.

ClinicalMind: Real-Time Clinical Analytics at Scale

“Real-time clinical analytics at scale: a platform built on large language models-powered knowledge graphs” — now published and available open access.

The paper describes ClinicalMind, the knowledge graph layer underneath TriClick. The system initializes from 300 curated authoritative sources — NCCN Oncology Guidelines, AHA Clinical Practice Guidelines, FDA Prescribing Information, specialty textbooks — providing 80-90% of core medical concepts upfront and reducing LLM invocation costs by approximately 70%.

A two-phase update strategy handles continuous document ingestion: Phase 1 extracts incremental information from new documents, and Phase 2 applies updates with conflict detection, temporal tagging, and validation against SNOMED CT, UMLS, ICD-10-CM, and RxNorm.

Results:

110,000 clinical documents + 60,000 EMRs processed
1.5 million core concepts, 3 million primary relationships
Average query delay: 1.7 seconds
BLEU: 0.85, ROUGE: 0.92

DOI: 10.1093/jamiaopen/ooaf167

The Bigger Picture

These five papers represent the research foundation that TriClick is built on:

Contract-Checked Editing ensures reasoning pipelines produce structurally valid outputs
HyperWorld gives AI agents the ability to plan before acting in constrained environments
EviDex keeps clinical evidence fresh and auditable in real time
Ontology-Guided Memory maintains context across long multi-session interactions
ClinicalMind provides the knowledge graph that every other layer reasons over

Combined with three AAAI 2026 presentations in January (CSLAN, Dynamic Consistency Index, LLM hallucination reduction), Hill Research has published eight papers at premier venues in 2026 — spanning ACL, AAAI, SIGMETRICS, MLSys, and JAMIA.

Learn More

ClinicalMind — JAMIA Open (Full Text)
AAAI 2026 SPARTA Workshop Presentations
For partnership inquiries, contact: info@hillresearch.ai