How to Evolve Data Platforms for the AI Era

By Richard Louden, Head of Technology (Data) at Nimble Approach

This blog explores how to evolve your data platform for the AI era – enabling it to serve both human users and provide AI-ready data – while building the foundations required to scale effectively.

Over the last 20 years, the value of organisational data has risen dramatically, becoming a foundation for driving growth and efficiency – fuelled by advances in BI / reporting, the data science boom, big data, and now AI. With each of these phases, we have seen an evolution in how data is governed, stored, and utilised, determined by the needs of the associated process and use cases. With each of these evolutions, we have also seen a shift in the underlying technology, as shown in figure 1.

The drive for greater organisational insight led to the rise of data warehouses, which have since evolved into lakehouses to better support the demands of big data and machine learning. However, whilst the underlying techniques and technologies have changed, the end user of the data – i.e. humans – has remained consistent. Throughout this era, data assets have been developed with people in mind, and as such have certain nuances:

They are refined and aggregated to form patterns that can be clearly understood with minimal cognitive load.
Little context is explicitly provided, instead being driven through organisational understanding.
Quality is paramount, with inaccuracies damaging any trust in the end result.

For a long time, this approach has given organisations all they needed to understand the past and attempt to predict the future, though with the advent of Large Language Models (LLMs) and agentic AI capabilities, this approach will need to adapt. The rapid advancement in the capability of AI models will drive the latest wave of growth and efficiencies, as organisations look to automate processes that were previously too human-centric to update.

Figure 1 – Evolution of the data technology landscape.

Understand What AI Actually Needs From Your Data

So, why does the next wave of technological advancement require a shift in how we approach our data? At its core, the argument is that although LLMs are often described in terms analogous to the human brain – capable of reasoning and thinking – the way they absorb and use information is fundamentally different. In order to drive adoption of AI solutions, we must consider these differences and provide data in such a way that they can be most effective:

AI can only infer based on the information it has. Providing appropriate organisational context is key to improving accuracy and limiting the risk of hallucinations, which can have serious negative consequences.
LLMs do not suffer the cognitive load that humans face when assessing high volumes of data, as such more is better.
Complete data is a need. Agents cannot intuitively ask questions to fill in the gaps, meaning missing data can significantly impact the validity of any outputs.
Meaning must be provided explicitly in a way that agents can ingest and utilise. Hierarchies, relationships, meanings, and constraints are key semantic elements that are often part of organisational knowledge and need to be made explicitly available.

Given these key differences, as highlighted in figure 2, there is a clear need for the organisations looking to embrace AI to consider how they enhance their data processes to enable this different way of working.

Figure 2 – Comparison of human and AI ready data approaches.

Build the Foundations For AI-Ready Data

“Our data works for our people, so we should be able to adapt it for AI” is a common assumption organisations make when preparing for AI adoption. This is driven by three key elements:

The wish to adopt AI quickly.
The pain of creating more processes to maintain.
A belief that data for one purpose can be stretched to others – despite what has been discussed above.

However, while this may appear to be the most efficient path, it often traps organisations in a cycle of incremental, half-measures that ultimately constrain the ROI of their AI initiatives. Instead, there needs to be a clear mandate for building out an AI-specific data capability that will service your current needs and help you scale your AI adoption. To do this, organisations need to consider the points discussed above around the differences between human and AI-ready data, implementing changes in three key areas:

Data:

Assess your data for completeness, quality, and timeliness and add steps to improve these where required. This will support LLMs to make decisions based on accurate and relevant information.
Build pipelines to supply non-aggregated but cleansed data that accurately represent the source tables.

Context:

Collate and transform sources of organisational context, such as departmental documentation, that can support AI when executing specific tasks.
Collate and transform external context sources, such as product information, relating to the underlying data or processes.

Semantics:

Build up the core semantic elements needed for AI to understand your data. This should include accurate descriptions of tables, columns, table relationships, and any key metrics.
Utilise the Open Semantic Interchange standard to build up these artefacts and make them available to your AI applications. This can be done simply through access to underlying files, rather than requiring complex knowledge databases.

Whilst the above may seem like an arduous task, it should be paired down through realistic expectations. Start with a validated use case and build up a plan to tackle these elements as a vertical slice, which can then be expanded upon as you continue maturing your AI capability. To support this, an example is provided below of what to do in each of these areas from a specific manufacturing use case, predictive maintenance.

A manufacturing organisation needs to maintain a suite of machines as effectively as possible in order to limit costly down-time. To support this, data relating to machine health is fed in real time from sensors to their data platform. Currently this data is transformed through multiple stages and built into a suite of reports for supervisors to understand when proactive maintenance is required. There is a clear use case here to utilise AI, either through automated analysis and work assignment or providing an interface for supervisors to ask questions of the data, rather than reading time consuming reports. However, to make this as accurate as possible a number of key sources need to be provided to the underlying AI models.

Timely, complete and unaggregated data needs to be available for the model to understand the past and current health of the machines, including sensor data, in-flight and completed maintenance work, part availability and staff skillsets and availability.
Documentation relating to internal maintenance procedures, machine tolerances and any workforce processes must be collated and transformed, so that it is accessible to the underlying models.
A semantic layer relating to the data needs to be created, documenting table and column descriptions, synonyms and relationships.

By providing these foundational elements, the models now have the capability to process large amounts of data with the organisational context that is normally held within individuals or teams. Doing so will significantly improve the accuracy of any associated decisions and support you in automating aspects of a key process, with the ability to still build in a human approval loop.

Overcome the Key Challenges to Implementation

When adopting any new approach or technology, there is always going to be some level of pain for organisations who don’t have the luxury of starting from scratch. As aforementioned, utilising data for strategic benefit is not a new concept and initiatives to drive data maturity have been in flight for a long period of time, providing mixed results. For organisations looking to build out their AI capabilities, it’s worth understanding the key challenges they’re likely to face – both from their current state and the broader complexities of adopting new technologies.

The Manual Build Up

Whilst adopting AI is focussed around automating key processes and decisions, there is a large amount of work that needs to be done to make AI effective. Creating new data assets and building up a semantic view of your data is essential, but will involve effort from your teams that shouldn’t be underestimated. Data documentation is an often overlooked aspect of data platforms, instead favouring a reliance on internal knowledge and questioning colleagues to provide context. As such, this knowledge will need to be collated and transformed into a format that can be provided to AI applications – a highly manual task when starting from lower levels of data maturity. However, there are ways to leverage AI within this process – for example, by recording discussions about data structures, transcribing them, and feeding them into AI coding assistants to handle much of the manual effort.

Skill Availability

As with all technology advancements, there is likely going to be a skills gap in your organisation that may limit how quickly you can mature. Whilst growing quickly, the pool of AI talent is still a small one which generally commands a higher salary cost. There should be some caution around rushing to invest in a permanent AI capability – especially given the cost – simply to show that action is being taken. A better approach is to understand your current capability gaps and utilise partners with the required expertise to fill these in the short term, allowing you to show value without the long-term cost. Leverage their expertise to shape your use cases and upskill your existing technical teams – delivering clear ROI from AI without significant upfront investment, while also giving your teams valuable development in a rapidly growing area.

Maintenance Burden

Adding new capability comes with a larger estate of data, technologies, and resources to manage – something which can have a significant impact in organisations that are lower on the data / tech maturity scale. The internal and external pressures to adopt AI can spread internal teams too thinly, leading to existing data and platform elements being ignored in favour of the new. This is particularly challenging in this space, given the clear divergence between what humans and AI need to operate effectively. Teams must now manage AI-specific data structures, contextual layers, and semantic elements alongside traditional systems. This complexity can be minimised through three key approaches:

Maturing your data operations to minimise ongoing maintenance and development effort.
Being specific in how you plan to implement AI, with a focus on a clear use case and setting organisational expectations.
Equip your teams with low effort but high impact tooling to make them as effective as possible.

Closing Summary

Hopefully this article has provided some insight into the impact that AI adoption will have on how organisations need to manage their data. Whilst at points it may look to throw up barriers to entry for those pressing forwards with AI adoption, the intent is to provide a clear and reasoned summary of what is required, within the data space, to make this change as effective as possible. The excitement around what this technology can provide is very founded, provided that solid foundations are put in place to support it. Whilst there is likely to be some friction on this journey, there are actions that can be taken to minimise this and allow organisations to realise the touted benefits.

Cookie	Duration	Description
cookielawinfo-checkbox-advertisement	1 year	Set by the GDPR Cookie Consent plugin, this cookie records the user consent for the cookies in the "Advertisement" category.
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
CookieLawInfoConsent	1 year	CookieYes sets this cookie to record the default button state of the corresponding category and the status of CCPA. It works only in coordination with the primary cookie.
rc::a	never	This cookie is set by the Google recaptcha service to identify bots to protect the website against malicious spam attacks.
rc::b	session	This cookie is set by the Google recaptcha service to identify bots to protect the website against malicious spam attacks.
rc::c	session	This cookie is set by the Google recaptcha service to identify bots to protect the website against malicious spam attacks.
rc::f	never	This cookie is set by the Google recaptcha service to identify bots to protect the website against malicious spam attacks.
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
_GRECAPTCHA	6 months	Google Recaptcha service sets this cookie to identify bots to protect the website against malicious spam attacks.

Cookie	Duration	Description
yt-player-bandwidth	never	The yt-player-bandwidth cookie is used to store the user's video player preferences and settings, particularly related to bandwidth and streaming quality on YouTube.
yt-player-headers-readable	never	The yt-player-headers-readable cookie is used by YouTube to store user preferences related to video playback and interface, enhancing the user's viewing experience.
yt-remote-cast-available	session	The yt-remote-cast-available cookie is used to store the user's preferences regarding whether casting is available on their YouTube video player.
yt-remote-cast-installed	session	The yt-remote-cast-installed cookie is used to store the user's video player preferences using embedded YouTube video.
yt-remote-connected-devices	never	YouTube sets this cookie to store the user's video preferences using embedded YouTube videos.
yt-remote-device-id	never	YouTube sets this cookie to store the user's video preferences using embedded YouTube videos.
yt-remote-fast-check-period	session	The yt-remote-fast-check-period cookie is used by YouTube to store the user's video player preferences for embedded YouTube videos.
yt-remote-session-app	session	The yt-remote-session-app cookie is used by YouTube to store user preferences and information about the interface of the embedded YouTube video player.
yt-remote-session-name	session	The yt-remote-session-name cookie is used by YouTube to store the user's video player preferences using embedded YouTube video.
ytidb::LAST_RESULT_ENTRY_KEY	never	The cookie ytidb::LAST_RESULT_ENTRY_KEY is used by YouTube to store the last search result entry that was clicked by the user. This information is used to improve the user experience by providing more relevant search results in the future.

Cookie	Duration	Description
_ga	1 year 1 month 4 days	Google Analytics sets this cookie to calculate visitor, session and campaign data and track site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognise unique visitors.
_ga_*	1 year 1 month 4 days	Google Analytics sets this cookie to store and count page views.

Cookie	Duration	Description
VISITOR_INFO1_LIVE	6 months	YouTube sets this cookie to measure bandwidth, determining whether the user gets the new or old player interface.
VISITOR_PRIVACY_METADATA	6 months	YouTube sets this cookie to store the user's cookie consent state for the current domain.
YSC	session	Youtube sets this cookie to track the views of embedded videos on Youtube pages.
yt.innertube::nextId	never	YouTube sets this cookie to register a unique ID to store data on what videos from YouTube the user has seen.
yt.innertube::requests	never	YouTube sets this cookie to register a unique ID to store data on what videos from YouTube the user has seen.

How to Evolve Data Platforms for the AI Era

Understand What AI Actually Needs From Your Data