Organize you Data: Auto-Generated Knowledge Graphs with Neo4j and Generative AI

by | Nov 1, 2023 | AI, Software Development

Introduction

In the world where data is king, the ability to harness unstructured data is a game-changer. Neo4j, a leading graph database, coupled with Google Cloud’s Generative AI, is pioneering this transformation.

Neo4j and Generative AI: Bridging the Structured-Unstructured Divide

Neo4j facilitates the creation of knowledge graphs, offering a structured view of data. On the other side, Google Cloud’s Generative AI sifts through unstructured data, identifying crucial entities and relationships. When integrated, they automate the conversion of unstructured data into a structured, queryable format, revolutionizing data management in sectors like manufacturing and supply chain management​.

Typical use cases in which Google already uses this pattern are, according to its own blog:

  • Healthcare – Modeling the patient journey for multiple sclerosis to improve patient outcomes
  • Manufacturing – Using generative AI to collect a bill of materials that extends across domains, something that wasn’t tractable with previous manual approaches
  • Oil and gas – Building a knowledge base with extracts from technical documents that users without a data science background can interact with. This enables them to more quickly educate themselves and answer questions about the business.

Automating the Extraction Process

Traditionally, extracting meaningful information from unstructured data to build knowledge graphs has been a manual, time-consuming task. However, with Generative AI, this process is automated. The AI identifies key entities and relationships, translating them into the Cypher query language for Neo4j, streamlining data storage and querying.

Neo4j: Query Knowledge Graphs with LLMs

Enhancing Search Capabilities

Neo4j recently introduced vector search to improve generative AI outputs, aiming to enhance semantic search and generative AI applications. This feature allows better access and utilization of unstructured data like text and images, enhancing the overall usability of the knowledge graph​.

Neo4j has taken a significant leap in automating the extraction process by incorporating vector search into its database capabilities, enhancing the way semantic searches and generative AI applications handle unstructured data. Vector search assigns a numerical value to unstructured data, enabling it to be searched and modeled more efficiently.

This not only speeds up the retrieval process but also boosts the relevancy and accuracy of search results. By making vector search a core feature, Neo4j addresses the need for more nuanced and intelligent data handling, ensuring that even non-recent data informs AI models and semantic searches. This update reflects a growing trend among database vendors to enhance their offerings with AI-driven features, responding to the demand for better, faster, and more accurate data insights.

Real-world Applications: Beyond Theory

Many large enterprises and SMBs have already leveraged Neo4j on Google Cloud for diverse AI use cases, ranging from anti-money laundering to personalized recommendations, supply chain management, and more. This real-world application demonstrates the practical value and versatility of combining Neo4j with Generative AI​.

Enterprise customers can now leverage knowledge graphs with Google’s large language models to make generative AI outcomes more accurate, transparent, and explainable

Neo4J, June 7, 2023

To enhance the capabilities of Large Language Models (LLMs), Neo4j can be integrated into orchestration frameworks such as LangChain and LlamaIndex. By adding and indexing vector embeddings directly into Neo4j’s knowledge graph, the system can generate user input embeddings and utilize similarity search to find and retrieve relevant nodes and their contextual information. This enriched context is then used to prompt LLMs—whether cloud-based or local—to provide natural language searches that are grounded with specific, contextual information from the knowledge graph, enhancing the accuracy and relevance of the LLM’s output.

import neo4j
import langchain.embeddings
import langchain.chat_models
import langchain.prompts.chat

emb = OpenAIEmbeddings() # VertexAIEmbeddings() or BedrockEmbeddings() or ...
llm = ChatOpenAI() # ChatVertexAI() or BedrockChat() or ChatOllama() ...

vector = emb.embed_query(user_input)

vectory_query = """
// find products by similarity search in vector index
CALL db.index.vector.queryNodes('products', 5, $embedding) yield node as product, score

// enrich with additional explicit relationships from the knowledge graph
MATCH (product)-[:HAS_CATEGORY]->(cat), (product)-[:BY_BRAND]->(brand)
MATCH (product)-[:HAS_REVIEW]->(review {rating:5})<-[:WROTE]-(customer) 

// return relevant contextual information
RETURN product.Name, product.Description, brand.Name, cat.Name, 
       collect(review { .Date, .Text })[0..5] as reviews, score
"""

records = neo4j.driver.execute_query(vectory_query, embedding = vector)
context = format_context(records)

template = """
You are a helpful assistant that helps users find information for their shopping needs.
Only use the context provided, do not add any additional information.
Context:  {context}
User question: {question}
"""

chain = prompt(template) | llm

answer = chain.invoke({"question":user_input, "context":context}).content

Conclusion

The synergy between Neo4j and Generative AI is not just a theoretical concept but a practical solution to the age-old problem of managing unstructured data. By automating the extraction process and enhancing usability, this combination is paving the way for industries to unlock the full potential of their data, driving better decisions and optimized operations. You can read about this combination in this great article by Google, where you will build a Investment Chatbot with few lines of code!

Automation of data extraction and storage with Neo4j and Generative AI, making unstructured data a valuable asset for informed decision-making in various industrial domains.
Your own financial chat bot, which can leverage knowledge graphs, combining neo4j with LLM