Evaluating triple relevance in GraphRAG solutions

6 min

Introduction

GraphRAG (graph-based retrieval-augmented generation) solutions are becoming increasingly prevalent in modern AI systems. As these solutions grow in complexity and adoption, the need for robust evaluation and optimization methods becomes crucial. This article focuses on one critical aspect of GraphRAG systems: the relevance of triples retrieved during the inference phase. The key questions we address are how relevant are these triples to the original query and how we can systematically assess their contribution to the response. If an automatic evaluator can be set up, its own performance must also be checked.

Understanding GraphRAG inference architecture

What distinguishes Lettria's GraphRAG solution is its hybrid architecture that combines the power of vector embeddings with graph-based structures. During data ingestion, the system stores extracted information in both a vector database and a graph database. In the vector database, nodes (representing entities), edges (representing relations between the entities) and document chunks are encoded as high-dimensional vectors that capture semantic meaning, enabling efficient similarity-based retrieval. Simultaneously, these same entities and relations are stored in a graph database, preserving their complex interconnections and enabling sophisticated relation analysis. This dual storage approach creates a robust foundation for the inference phase.

When a query is submitted, the system employs a sophisticated multi-step process during the inference phase. It starts by embedding the query into a vector that captures its semantic meaning. This vector is then used to query the vector database, retrieving the most relevant chunks and a list of relations closely aligned with the query. These semantically similar relations form the foundation for the system's response.

The system then takes this information and expands it into a graph within the graph database, creating a richer network of related nodes and connections. In this graph, each unit formed by two entities and the relation linking them constitutes a triple. To ensure relevance, the expanded graph is filtered by cross-referencing it with the original vector data, keeping only the most meaningful and important triples. From the refined graph, the top K most relevant nodes are selected, and these are combined with the key triples identified earlier from the vector database.

This approach allows Lettria's GraphRAG to seamlessly merge the power of vector-based semantic understanding with the structural depth of the graph database, delivering highly relevant and precise results.

Triples evaluator

To assess the quality of retrieved triples, we developed an automated evaluation system using a LLM (large language model). The evaluator classifies triples used in response generation into three distinct categories:

– Relevant: triples that directly contribute to answering the query, particularly when all three elements align with query components.

– Indirectly relevant: triples that provide supporting context or intermediate steps, typically forming part of a larger solution path.

– Irrelevant: triples that have no meaningful connection to the query.

However, using a LLM as a judge to assess the performance of a LLM-based solution itself is not without risk. That's why our evaluation team carried out an analysis to check the quality of our evaluator.

Evaluation process

Our evaluation focused on assessing the quality of the triple classification model across four distinct professional domains: finance, healthcare, industry, and legal. For each domain, we used comprehensive datasets consisting of multiple question-answer pairs, all carefully crafted and validated by experts.

This approach allowed us to evaluate the model's performance across different types of queries within each domain, from straightforward factual questions to more complex analytical queries. By using human-verified question-answer pairs, we established a reliable ground truth for assessing the relevance of retrieved triples.

The evaluation process involved a manual review by a team of domain experts. It involved verifying whether the triples identified in the relevant and indirectly relevant categories were correctly classified. If a triple was accurately placed in its category, it was marked with a “Yes.” If a triple was incorrectly classified, it received a “No,” along with the correct category where it should have been assigned. For instance, if a triple belonged in the irrelevant category, it was noted as: “No = irrelevant”.

Improve Your RAG Performance with Graph-Based AI.

Download our free white paper →

Results

Our evaluation across different domains showed promising results.

The overall system performance is quite satisfying:

– Accuracy for relevant triples: 86.97%.

– Accuracy for indirectly relevant triples: 86.45%.

– Overall accuracy: 86.71%.

The evaluation results demonstrate strong performance across all tested domains, with the industry sector showing exceptional accuracy at 95.77%. The system maintained consistent performance across both relevant and indirectly relevant classifications, achieving an overall accuracy of 86.71%.

Notably, only 15% of the total evaluated triples were classified as either relevant or indirectly relevant, indicating effective filtering of non-essential information. This combination of high accuracy and efficient filtering suggests that the system is well-suited for practical applications in real-world scenarios. Though the evaluator may not reach perfect accuracy, its performance greatly speeds up iteration cycles in the development and refinement of Lettria's GraphRAG solution, making it an invaluable asset for continuous optimization.

Limitations and challenges

Three main challenges emerged during the evaluation process.

The first is the ambiguity in indirect relevance. Determining indirect relevance often requires assumptions about the solution path and the complete reasoning chain isn't always visible in the final response. So it's not always easy to determine whether a triple belongs in this category.

The second is the lack of clarity in the relation names. Triples are made up of two entities linked by a relation. While the name of the entities is generally extracted from the ingested data, that of the relation is often created by the model that produced the triples. And some of these names lack explicit meaning, as seen in examples like:

"material/aluminum alloys" dominant {} "concept/aircraft performance"

"material/magnesium" comparison {} "concept/CFRP"

"vaccine/COVID-19 vaccine" unclear {} "concept/mucosal immune response"

As a result, it is sometimes difficult to assess the relevance of the triples in question. It's better, of course, to use an ontology with predefined property names, but this isn't always possible.

The third challenge is the directional ambiguity. In some cases, the meaning of the relation is understandable but it shows unclear directionality: it's the role of each entity in that relation that's hard to determine. For example, if the relation is called inclusion, does that mean that A includes B or that A is included by B? This problem, sometimes simply caused by the absence of a preposition in the name of the relation, makes it difficult to classify certain triples.

Despite these challenges, the evaluation team maintained high confidence in their classifications, as these issues appeared relatively infrequently in the dataset. What's more, these problems are potentially solvable. For example, prompts could be created to instruct the model to make relation names more explicit during graph extraction.

Conclusion

The triples evaluator demonstrates strong potential for improving GraphRAG systems, achieving an overall accuracy of 86.71% across diverse professional domains while effectively filtering out 85% of irrelevant information. These results, particularly the exceptional 95.77% accuracy in the industry domain, validate the evaluator's capability to enhance the precision of knowledge retrieval in practical applications. While challenges remain in areas such as relation naming clarity, the system's robust performance across different domains establishes it as a valuable tool for optimizing Lettria’s GraphRAG solution in professional environments.

‍

Ready to revolutionize your RAG?

Request a demo →

Oscar Moreno Escobar

Computational linguist - Managing language models and ontologies to structure raw data into knowledge.

Evaluating triple relevance in GraphRAG solutions

Introduction

Understanding GraphRAG inference architecture

Triples evaluator

Evaluation process

Results

Limitations and challenges

Conclusion

Keep reading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading