Exploring Databricks Genie: Enhancing Insight Through Conversational Analytics

By Richard Louden, Head of Technology (Data) at Nimble Approach

Many organisations are eager to adopt AI, with a number of the larger platform players marketing key functionality to accelerate this. This blog explores if the reality of these features matches the maAs AI functionality is integrated with data platforms, a potential evolution of dashboards and reporting has emerged where users can now answer questions around organisational data via natural language.

This blog reviews Databricks Genie Spaces, one of the current offerings in this area, to assess how it can deliver value to organisations and highlight the factors that should be considered when implementing it.

Evolving Organisational Insight Using LLMs

For the majority of business users, dashboards are the main approach to understanding what has happened within their organisation to support decision making. The development of these pre-determined reports follows a standard pattern of: understand the questions a user will ask of the data, develop underlying transformations, and present this data to help them answer said questions. Whilst effective at creating an informative output at first, this process falls down as users start to expand and evolve their questions. Addressing these evolving requirements often demands new data sources, visualisations, and, in some cases, entirely new dashboards, all of which require development effort and can impact the user experience.

As this process continues, it becomes more obvious that a single solution is not fit for the two distinct use cases – i.e. quickly understanding an aspect of the organisation and interrogating organisational data as a non technical user. While pre-defined dashboards can satisfy the first requirement, they are less effective at addressing the second.

To bridge this gap, organisations are increasingly turning to alternative solutions, with Large Language Models (LLMs) emerging as a particularly promising option. Such tools have cemented themselves as effective coding assistants, evolving how engineers build applications and data pipelines. Extending similar capabilities to business users is a natural next step. Given access to the appropriate data and context, Large Language Models (LLMs) can translate business questions into underlying queries and deliver relevant, data-driven answers.

Databricks Genie Spaces: A Natural Language Overlay To Your Data

With the majority of organisations storing copies of their organisational data in one of the major platform solutions (Databricks, Snowflake & Fabric), it’s not surprising to see these providers develop products in this space. Databricks addresses this need through Genie Spaces, enabling organisations to create tailored environments in which users can explore and query data using natural language. On top of this core functionality, they have been woven into the Databricks ecosystem to provide a suite of supporting features, such as:

Permission management through Unity Catalog, ensuring robust data governance and security.
Use of metric views as a data source, to simplify investigation of key metrics.
Mechanisms to provide metadata to support the underlying LLM agent in its tasks, such as join conditions and business synonyms.
Ability to combine Genie Spaces with other Databricks AI offerings to create agent chains for more complex tasks.

On paper, this provides a secure way for users to query organisational data, reducing their reliance on curated dashboards and enabling a more flexible approach to data exploration. With this in mind, I wanted to test the technology firsthand to understand where it could deliver the greatest value within future Databricks platforms I work on.

Testing Genie Spaces

Space creation is supported through both the Databricks portal and API, but there is currently no support for provisioning Genie Spaces via Asset Bundles. To create a space via the API, you need to provide a serialized_space parameter. This is a large JSON string that defines all the elements for the space, such as data connections, semantics, and example queries. Given its complexity, I would recommend creating an initial space through the portal and then extracting the associated configuration if you need to replicate it across other workspaces.

Creation via the portal is incredibly simple – just locate the Genie Spaces page, click on ‘create new’ in the top right, and then add your data sources. This will give you a chat UI – similar to the one shown in figure 1 – where you can pose your questions once you’ve connected your data sources.

Now you have the basic elements, you can start adding organisational context to improve the responses. This includes elements such as specifying how tables join together, providing internal synonyms for column names, and instructions for the underlying LLM, such as ‘this means X’. Configuring a space through the portal requires stepping through several interfaces (Figure 2), which can take some time. However, this process improves response accuracy and produces a reusable template that can later be extracted via the API for use in additional spaces.

Figure 2 – Adding organisational context and semantics

Once you have your data and metadata organised, you can start using the space to investigate – with a lot more freedom than you would find in a standard dashboard (Figure 3). Once a question is posed, the underlying LLM will assess the data it has access to, build up a plan of how best to answer the query, and then develop and run code to answer it.

Alongside the query output and associated visualisations, users are able to access the actual code that was run (Figure 4), meaning those with an understanding of SQL and the data assets can assess how accurate the response is. This is very important during the set-up phase of Genie Spaces, as it allows data engineers and analysts to test known queries with expected outputs and tweak any of the Spaces metadata.

Figure 3 – Example of Genie output

Figure 4 – Underlying SQL query created and ran by the LLM

In addition to providing access to underlying tables, Genie can leverage metric views to improve the reliability of organisational metric calculations, rather than relying on the LLM to determine the appropriate data, filters, and aggregations. Metric views, defined in with SQL or YAML files (Figure 5), have been implemented to move the mass of organisational context that exists in BI systems into the data platform, where they can be more tightly governed.

For example, definitions can be created for different organisational groups, to ensure that their specific variations on key metrics are not lost. These can then be enhanced through adding departmental context, such as comments and synonyms, that support the Genie Space LLM in converting variable natural language queries into the correct metric. An example of how the underlying queries differ is shown in Figure 6, where the same question was posed as in the previous example. When a metric view is provided as a data source, the LLM can leverage its associated metadata to retrieve defined metrics, improving accuracy compared to inferring calculations directly from raw data tables.

Figure 5 – Example of a Databricks metric view definition

Figure 5 – Example of a Genie Space query, when relying on metric views as opposed to 1

Closing Thoughts

After spending some time building out a Genie Space and working through a few of the enhancements, my conclusion is that there is clear value in the concept. Having spent years building dashboards for various key user groups, I understand the pains of their rigidity and the ease at which they can multiply as users find new questions to ask of their now accessible data.

My opinion is that this value is found through using Genie Spaces to augment and rationalise organisational dashboards, rather than as a direct replacement. Dashboards still have a place as a cost effective mechanism for users to understand key metrics and past performance, especially now the semantic elements can be shifted into the data platform where they can be better managed. Genie Spaces can then enable users to directly investigate more specific areas, and support the core reporting layer through two key processes:

Help rationalise the number of existing dashboards to reduce maintenance and operational costs by replacing those that do not align with agreed organisational metrics.
Provide a governed space for non-technical users to query data, fulfilling future requirements that would otherwise require the development of new analytical processes or outputs.

As with adopting any new technology, however, consideration needs to be made for the more mundane areas that support the longevity of this approach, rather than just focusing on the interesting outcome. For Genie Spaces, this currently falls into three main areas: reproducibility, cost, and governance.

Reproducibility: Organisations need to consider both how Genie workspaces can be replicated across multiple environments and how responses to similar queries are tracked. The replication aspect can only be automated via the API at present, which will likely require new processes to be established but does significantly reduce manual effort. In terms of ensuring accuracy, workspace owners can establish benchmarks to test their spaces, alongside a number of monitoring tools to ensure that question X consistently returns answer Y.

Cost: As with most Databricks features, there is an underlying cost. For Genie Spaces, this is not based on LLM usage, but the underlying SQL cluster that is used to access data and calculate responses. Giving users free reign to pose questions can therefore end up accruing a larger bill than organisations may have come to expect compared with dashboards that are updated on set intervals. As such, workspace administrators should establish reasonable budgets and policies on such resources to prevent overspending and provide clear usage statistics to end users.

Governance: Given Genie Spaces can be used to pull back row-level data, there needs to be a clear governance wrapper to prevent misuse or certain users accessing inappropriate data. This can be managed through Unity Catalog, but it needs these foundations in place to function correctly. Clear permission structures and attribute-based controls should be established on the underlying data, which can then be enforced by the Genie space on a per user basis.

Clearly there is a valuable niche for Genie Spaces to fill when it comes to helping users understand their organisation’s data, and it does this well. As with all new functionality, however, there needs to be clear data and governance foundations in place from the outset to support both its accuracy and longevity post-implementation.

Cookie	Duration	Description
cookielawinfo-checkbox-advertisement	1 year	Set by the GDPR Cookie Consent plugin, this cookie records the user consent for the cookies in the "Advertisement" category.
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
CookieLawInfoConsent	1 year	CookieYes sets this cookie to record the default button state of the corresponding category and the status of CCPA. It works only in coordination with the primary cookie.
rc::a	never	This cookie is set by the Google recaptcha service to identify bots to protect the website against malicious spam attacks.
rc::b	session	This cookie is set by the Google recaptcha service to identify bots to protect the website against malicious spam attacks.
rc::c	session	This cookie is set by the Google recaptcha service to identify bots to protect the website against malicious spam attacks.
rc::f	never	This cookie is set by the Google recaptcha service to identify bots to protect the website against malicious spam attacks.
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
_GRECAPTCHA	6 months	Google Recaptcha service sets this cookie to identify bots to protect the website against malicious spam attacks.

Cookie	Duration	Description
yt-player-bandwidth	never	The yt-player-bandwidth cookie is used to store the user's video player preferences and settings, particularly related to bandwidth and streaming quality on YouTube.
yt-player-headers-readable	never	The yt-player-headers-readable cookie is used by YouTube to store user preferences related to video playback and interface, enhancing the user's viewing experience.
yt-remote-cast-available	session	The yt-remote-cast-available cookie is used to store the user's preferences regarding whether casting is available on their YouTube video player.
yt-remote-cast-installed	session	The yt-remote-cast-installed cookie is used to store the user's video player preferences using embedded YouTube video.
yt-remote-connected-devices	never	YouTube sets this cookie to store the user's video preferences using embedded YouTube videos.
yt-remote-device-id	never	YouTube sets this cookie to store the user's video preferences using embedded YouTube videos.
yt-remote-fast-check-period	session	The yt-remote-fast-check-period cookie is used by YouTube to store the user's video player preferences for embedded YouTube videos.
yt-remote-session-app	session	The yt-remote-session-app cookie is used by YouTube to store user preferences and information about the interface of the embedded YouTube video player.
yt-remote-session-name	session	The yt-remote-session-name cookie is used by YouTube to store the user's video player preferences using embedded YouTube video.
ytidb::LAST_RESULT_ENTRY_KEY	never	The cookie ytidb::LAST_RESULT_ENTRY_KEY is used by YouTube to store the last search result entry that was clicked by the user. This information is used to improve the user experience by providing more relevant search results in the future.

Cookie	Duration	Description
_ga	1 year 1 month 4 days	Google Analytics sets this cookie to calculate visitor, session and campaign data and track site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognise unique visitors.
_ga_*	1 year 1 month 4 days	Google Analytics sets this cookie to store and count page views.

Cookie	Duration	Description
VISITOR_INFO1_LIVE	6 months	YouTube sets this cookie to measure bandwidth, determining whether the user gets the new or old player interface.
VISITOR_PRIVACY_METADATA	6 months	YouTube sets this cookie to store the user's cookie consent state for the current domain.
YSC	session	Youtube sets this cookie to track the views of embedded videos on Youtube pages.
yt.innertube::nextId	never	YouTube sets this cookie to register a unique ID to store data on what videos from YouTube the user has seen.
yt.innertube::requests	never	YouTube sets this cookie to register a unique ID to store data on what videos from YouTube the user has seen.

Exploring Databricks Genie: Enhancing Insight Through Conversational Analytics

Evolving Organisational Insight Using LLMs

Databricks Genie Spaces: A Natural Language Overlay To Your Data

Testing Genie Spaces

Closing Thoughts

Get In Touch

Recent Posts

Blog Categories

Contact Us

What We Do

Previous PostOrchestrating Agentic Systems: Building Self-Organising Agents with A2A

Next PostInside Nimble: A Conversation with Katie Hardy

Menu

Contact

Exploring Databricks Genie: Enhancing Insight Through Conversational Analytics

Evolving Organisational Insight Using LLMs

Databricks Genie Spaces: A Natural Language Overlay To Your Data

Testing Genie Spaces

Closing Thoughts

Get In Touch

Recent Posts

Blog Categories

Contact Us

What We Do

Previous PostOrchestrating Agentic Systems: Building Self-Organising Agents with A2A

Next PostInside Nimble: A Conversation with Katie Hardy

You May Also Like

Where Agentic AI Creates Measurable Value

Orchestrating Agentic Systems: Building Self-Organising Agents with A2A

How Will AI and Machine Learning Change Education as We Know It?

Menu

Contact