Healthcare organizations generate immense volumes of data, but an estimated 95% of this data goes unused – largely because it is fragmented, inaccessible, and unstructured. Advancements in AI have presented a unique opportunity to transform and clean massive streams of healthcare data to make it available for research, innovation, and patient care. The Truveta Language Model (TLM), a large-language model trained on medical records data, cleans billions of EHR data points provided by 30 member health systems to enable innovative healthcare research.

This whitepaper provides insight into:

What TLM is, how it works, and what sets it apart from other large language models

How TLM handles the ingestion and cleaning of healthcare data to ensure high-quality inputs for research

The TLM training process and the clinical experts involved

How TLM extracts and normalizes concepts contained within clinical notes

Download Truveta Language Model Whitepaper

Truveta Language Model