What is Sidekick? An introduction to AI-powered data discovery

The problem Sidekick solves

Before any enterprise can use its data to power AI, make better decisions, or simply understand what it owns — it first needs to know what data it has, where it lives, what it means, and whether it can be trusted.

That sounds straightforward. In practice, it almost never is. Data sits across dozens of siloed systems — SQL databases, ERP platforms, data warehouses, legacy applications — accumulated over years with inconsistent naming, no central catalog, and documentation that exists only in the heads of people who may have already left the organisation.

This is the problem Sidekick was built to solve.

The core insight: You cannot reason over data you do not understand. Most AI initiatives stall not because the models are wrong, but because nobody actually knows what data they have, where it lives, or what it means. Sidekick solves that before anything else.

What Sidekick is

Sidekick is an on-premises AI platform that gives enterprises a living, governed understanding of their entire data estate — automatically. It connects to your existing databases, scans every table and field, builds a plain-English data dictionary and ontology, classifies sensitive data, and makes the whole estate queryable in natural language.

No replatforming. No data scientists required. No data leaves your environment.

Sidekick is built and maintained by Sidekick Lab, spun out of FitVault in 2025. It has been deployed with enterprise clients across media, insurance, financial services, healthcare, and government sectors.

How it works: the three-phase chain

Sidekick works through a structured three-phase process that takes your data estate from opaque to actionable.

Phase 01 — Discover Automated scan of every connected source — tables, fields, relationships, data types, and PII classifications.

Phase 02 — Understand Builds a plain-English data dictionary, ontology, and sensitivity heatmap across your entire estate.

Phase 03 — Reason Natural language querying, real-time reports, and cross-source insights. No SQL. No tickets. On demand.

Discovery and cataloging are the foundation. Everything else — querying, reporting, AI grounding, compliance assessments — depends on having an accurate, governed picture of what exists first.

What you get out of it

Once Sidekick has scanned and cataloged your data sources, you have access to a set of outputs that would otherwise require months of manual effort:

A data dictionary with business descriptions, quality scores, and sensitivity classifications for every field
An enterprise data catalog spanning all connected sources
Automatic PII and sensitive data detection across every system, not just the ones someone thought to check
A cross-source interoperability map showing how your data assets relate to one another
An AI-readiness report identifying where your data is strong and where the gaps are
Natural language querying: ask a question in plain English, get an answer

Who it is for

Sidekick is designed for enterprise organisations — but it delivers different value to different people within them.

Role	What they get from Sidekick
CIO / CTO	Full visibility across the data estate; migration impact analysis; system rationalisation
CDO / Data Lead	Automated data dictionaries, ontology generation, lineage mapping — without a 12-month project
CISO / Compliance	PII detection, sensitivity classification, POPIA and GDPR readiness reporting
AI / Data teams	A governed, structured foundation for LLMs and AI agents; auto-generated use case candidates
CEO / Board	Enterprise data visibility; AI readiness assessment; strategic data asset identification

How it deploys

Sidekick runs entirely within your own environment — on two virtual machines (one Windows, one Linux) that you provision. It does not require internet access. Your data never leaves your infrastructure.

Windows VM — SQL Server + agent memory Runs SQL Server 2022. Stores agent memory, configuration, and read-only integrations into your data sources via linked servers. Minimum 4 vCPU / 16 GB RAM.

Linux VM — Docker containers Runs Sidekick's frontend and backend as Docker containers via Docker Compose. Receives updates by pulling from Azure Container Registry. Minimum 2 vCPU / 8 GB RAM.

Sidekick connects to your data sources in read-only mode through linked servers. It has zero write access to your systems. Access is managed through Microsoft Entra ID with role-based controls.

For organisations with highly sensitive environments, physical on-domain server deployment is also available as an alternative to VMs.

What Sidekick is not

It is worth being precise about scope. Sidekick is the data intelligence and governance layer — phases 1 through 3 of the data value chain. It discovers, understands, and makes your data queryable. It does not replace your data warehouse or storage infrastructure, and it does not deploy AI models into production or build data products on your behalf.

It makes your existing data infrastructure understandable and investable. The decisions about what to build on top of that foundation remain yours.

Getting started

Sidekick is deployed by a Sidekick Lab Partner — an organisation that handles infrastructure provisioning, SQL setup, and Docker deployment using provided scripts, guided by Sidekick Lab's technical team. Sidekick Lab provides AI model provisioning, deployment support, and ongoing updates.

For most Azure-hosted deployments, the Sidekick team can deploy directly into a provisioned resource group. Self-provisioning is also fully supported.

Ready to get started?

See what Sidekick finds in your data estate

Most organisations are surprised by what a first discovery surfaces. Talk to the Sidekick Lab team about a Proof of Value engagement — scoped, time-bounded, and deployed in your own environment.