Large language models (LLMs) have the potential to democratize access to medical knowledge. Unfortunately, the enormous potential of these models is either locked behind commercial/research licenses, in violation of privacy regulations, limited in scale, or not generalizable to underserved populations and resource-limited settings.
To address this issue, we developed Meditron-70B, currently the world’s best-performing fully open-source chatbot for medicine, trained on carefully curated clinical practice guidelines from diverse settings.
However, the performance of these chatbots is commonly measured on medical exam questions, which does not adequately evaluate real-world clinical utility and safety.
In this talk, I introduce Meditron and show how we are crowdsourcing incentivized expert evaluations that is putting Meditron to the test. I introduce the MOOVE (Massive Online Open Validation and Evaluation) platform that allows doctors to validate the real-world performance of Meditron in terms of helpfulness, harmlessness, bias, trust, and safety. In return for this rigorous validation, participants can get their own chatbot, adapted to their preferences and specialty.