AWS debuts advanced RAG features for structured, unstructured data


Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More


Getting enterprise data into large language models (LLMs) is a critical task for enabling the success of enterprise AI deployments. 

That’s where retrieval augmented generation (RAG) fits in, which is an area where many vendors have offered various solutions. Today at AWS re:invent 2024 the company announced a series of new services and updates designed to help make it easier for enterprises to get both structured and unstructured data into RAG pipelines. Making structured data accessible for RAG requires more than just looking up a single row in a table. It involves translating natural language queries into complex SQL queries to filter, join tables and aggregate data.The challenges are further compounded for unstructured data, where by definition there is no structure for the data.

To help solve those challenges AWS announced new services for structured data retrieval support, ETL (extract, transform and load) for unstructured data, data automation and knowledge base support.

“Retrieval augmented generation (RAG) is a very popular technique for customizing your data, but one of the challenges with retrieval augmented generation is it’s historically been mostly for text data,” Swami Sivasubramanian, VP of AI and Data at AWS, told VentureBeat. ” And if you see enterprises, most of the data, especially operational, is sitting in data lakes and data warehouses, and that has never been ready for RAG, per se.”

Improving structured data retrieval support with Amazon Bedrock Knowledge Bases

Why isn’t structured data ready for RAG? Sivasubramanian provided a few scenarios.

“To build a highly accurate, secure system, you’ve got to actually understand the schema, build a custom schema embedding, and then actually understand the historical query log, and then keep up with the changes and schemas,” Sivasubramanian said.

During his keynote at re:invent Sivasubramanian explained that the Amazon Bedrock Knowledge Bases service is a fully managed RAG capability that enables enterprises to customize responses with contextual and relevant data.

“It automates the complete RAG workflow, removing the need for you to write custom code to integrate your data sources and manage queries,” he said.

With structured data retrieval support in Amazon Bedrock Knowledge Bases, Sivasubramanian said that AWS is providing a fully managed RAG solution. It enables enterprises to natively query all their structured data  to generate results for generative AI applications. Knowledge Bases will automatically generate and execute the SQL queries to retrieve enterprise data and then enrich the model’s responses.

“The cool thing is, it also adjusts to your schema and data, and it learns from your query patterns and provides the customization options for enhanced accuracy,” he said. “Now with the ability to easily access structured data for your RAG, you will generate more powerful and intelligent gen AI applications in the enterprise.”

GraphRAG: Bringing it all together in a knowledge graph

Another key enterprise AI challenge that AWS is looking to solve for RAG is helping to improve accuracy, with more data sources. That’s the challenge that the new GraphRAG capability aims to solve.

“One of the big challenges in enterprises is to piece apart distinct pieces of data and show how they are connected so that you can build explainable RAG systems,” Sivasubramanian said. “This is where knowledge graphs are super important.”

Sivasubramanian explained that knowledge graphs create relationships across multiple data sources by connecting different pieces of information.

“When these relationships are converted into graph embeddings for your gen AI applications, the system can easily traverse this graph and retrieve these connections to gather a holistic view of your customer data,” he said.

The new GraphRAG capabilities in Amazon Bedrock Knowledge Bases automatically generate graphs using the Amazon Neptune graph database service. Sivasubramanian noted that itlinks the relationship between various data sources, creating more comprehensive Gen AI applications without the need for any graph expertise.

Tackling the challenges of unstructured data with Amazon Bedrock Data Automation

Another critical enterprise data challenge is the issue of unstructured data. It’s an issue that many vendors are trying to solve, including startups like Anomalo.

When data, be it a pdf, audio or video file needs to be indexed for RAG use cases, having some kind of understanding of what’s in the data is crucial to making the data useful.

“Unfortunately, unstructured data is difficult to extract and it needs to be processed and transformed to make it ready,” Sivasubramanian said.

The new Amazon Bedrock Data Automation technology is AWS’ answer to that challenge. Sivasubramanian explained that the feature will automatically transform unstructured multi model content into structured data to power gen AI applications,

“I like to think of this as a gen AI powered ETL [Extract,Transform and Load] for unstructured data,” he said. 

Amazon Bedrock Data Automation will automatically extract, transform and process an enterprise’s multimodal content at scale. He noted that with a single API, an enterprise  can generate custom outputs,  aligned to data schemas and parse multimodal content for genAI applications.

“With these updates, we are empowering you to harness all of your data to build contextually more relevant gen AI applications,” he said.



Leave a Comment