Wisecube: Enhancing Biomedical Knowledge Graphs with Lettria’s Text-to-Graph

Enhancing Biomedical [.orange]Knowledge Graphs[.orange] with Lettria’s [.purple]Text-to-Graph[.purple]

Discover how Lettria transforms unstructured biomedical text into structured knowledge to enhance Wisecube’s medical knowledge graph.

108

GB of text

350,000

articles processed

+500

classes monitored

3500

triples extracted

Get your free demo ->

Increase your RAG accuracy by 30%

Talk to an expert ->

About Wisecube

Wisecube is a company specializing in artificial intelligence (AI) solutions tailored for the life sciences industry. Their mission is to unify and synthesize private and public data, revealing hidden insights to accelerate life science research.

Introduction

In biomedical research, knowledge graphs have emerged as powerful tools for organizing and leveraging vast and complex datasets. They enable the integration of diverse data sources, such as molecular interactions, pharmacological datasets, and clinical records, into a structured format that facilitates efficient information retrieval and automated knowledge discovery.

By implementing knowledge graphs, researchers can navigate intricate biomedical data more effectively, leading to advancements in precision medicine, drug discovery, and overall scientific research. This structured approach allows for the identification of novel insights and accelerates the development of new therapeutic strategies.

Wisecube's expertise in developing AI platforms that integrate knowledge graphs positions them as a valuable partner in the biomedical research community, contributing to the acceleration of scientific discoveries and the improvement of patient outcomes.

For more informations, check out this article on how Wisecube is Accelerating Biomedical Innovation by Combining NLP and Knowledge Graphs

Challenge: integrating complex biomedical data into an existing graph

Wisecube is building a cutting-edge biomedical knowledge graph, leveraging a Wikidata biomedical subset as its foundation. The challenge was to expand this graph by integrating valuable scientific insights from PubMed abstracts, extracting biomedical relationships, biomarkers, and other entities.

However, structuring biomedical text presents significant hurdles:

Unstructured Data Complexity: PubMed abstracts contain dense scientific text, making automated structuring difficult.
Scalability: The Biolink Model, used for ontology alignment, is large (~200K tokens), exceeding standard LLM processing limits.
Accuracy and Integration: Extracted entities must align with Wikidata QIDs and integrate effortlessly into Wisecube’s existing knowledge graph (KG).

Wisecube needed an efficient, scalable, and ontology-driven approach to extract meaningful insights from scientific literature while ensuring compatibility with its knowledge graph. Here is an example of a PubMed abstract that was ingested:

Solution: Lettria’s AI-powered Text-to-Graph Pipeline

Lettria implemented a cutting-edge Text-to-Graph pipeline designed to extract and structure knowledge from biomedical text, and integrate it into Wisecube’s Knowledge Graph.

Overcoming Scalability Challenges

The Biolink Model is an open-source data model designed to standardize types and relationships in biological knowledge graphs. It is vast and cannot be processed efficiently in its entirety. Lettria innovated by modularizing Biolink into 10 self-sufficient ontologies, each focused on specific biomedical domains such as diseases, molecular entities, and chemicals.

LLM-Assisted Modularization: A 1M-token context-size LLM split Biolink into coherent, processable segments.
Efficient Processing: Each abstract was processed sequentially using these modules, ensuring speed and accuracy.

Scalable Knowledge Extraction

Ontology-Based Processing: Leveraging the Biolink Model, Lettria ensured extracted relationships maintained semantic consistency.

Guided LLM Extraction: A carefully designed prompt instructed the LLM to extract relevant triples while adhering to ontology logic.
Triple Extraction Output: The final output was a structured RDF graph, aligned with Wisecube’s KG.

Wikidata Entity Alignment

To integrate extracted knowledge into Wisecube’s KG, Lettria implemented an entity-mapping process:

Extract entity labels from RDF files.
Query Wikidata for potential matches.
Use an LLM for entity linking, evaluating label similarity, type compatibility and descriptions.
Enrich RDF files with statements, ensuring compatibility.

Production-Ready Deployment

The structured RDF data was:

Stored in AWS S3, ensuring security and accessibility.
Ingested into AWS Neptune, allowing scalable graph storage and query execution.
Merged into a single knowledge graph, making biomedical insights easily retrievable.

AI's impact on biomedical knowledge graphs

High-Quality Biomedical Data Extraction

Lettria’s pipeline accurately extracted biomarkers, disease relationships and biomedical interactions from PubMed abstracts.

Scalable and Efficient Processing

The modularized Biolink approach allowed Wisecube to process large volumes of text efficiently, without compromising accuracy.

Seamless Integration with Wisecube’s Knowledge Graph

Entity mapping ensured structured knowledge aligned with Wikidata, improving interoperability with existing biomedical data.

Biomedical AI Roadmap

The development roadmap for the solution includes several strategic phases. In the short term, the focus is on expanding dataset coverage to process a greater volume of PubMed abstracts, while simultaneously exploring new applications such as drug efficacy prediction. The plan also includes automating and refining the entity mapping between Biolink labels and Wikidata QIDs, which will significantly improve data integration capabilities.

Looking toward the long-term vision, full integration with Wisecube's production pipeline is planned, along with a substantial broadening of scope to include full-text scientific articles rather than being limited to abstracts. Implementation of advanced query capabilities in AWS Neptune will unlock deeper and more nuanced biomedical insights for users.

Lettria: an AI leading partner for the biomedical field

Its expertise enables Lettria to offer advanced capabilities in text-to-graph extraction, uncovering hidden biomedical relationships beyond structured databases and providing deeper insights. With an ontology-driven approach powered by the Biolink Model, it ensures scientific accuracy and maintains data integrity. Designed for scalability and high performance, its modular ontology processing and AWS Neptune deployment enable efficient large-scale text processing.

With Lettria’s cutting-edge Text-to-Graph pipeline, Wisecube now has a powerful, scalable way to extract structured biomedical knowledge and enrich its knowledge graph, paving the way for new scientific discoveries.

Get started with GraphRAG in 2 minutes

Talk to an expert ->