Vast amounts of medical knowledge is contained in research papers, medical records or hospital databases. However, integrating and querying these distributed data sources is very hard and often requires advanced data management technology and skills to properly organize and analyze the data. Hence, medical researchers without deep technical knowledge are often excluded from efficiently analyzing these information sources.
In this talk we explain how to integrate and query both unstructured data from text documents such PubMed and structured data from databases containing biomarker information. The basic idea is to use machine learning techniques to extract information from research articles and then automatically integrate them into medical databases to enrich existing knowledge bases. Finally, we use neural network-based transformer architectures to translate natural languages questions into a database query language. The advantage of this approach is that, for instance, medical researchers studying cancer, can intuitively query large amounts of data without requiring any technical knowledge of the underlying databases – nor a Ph.D. in computer science or math.