Filling the data platform gap in a digital sovereign tech stack
Across the EU and here in Sweden authorities and companies are working hard to come to terms with a changed geopolitical environment and concerns about data and digital regulation. Digital sovereignty is now a topic of EU summits.
This is often unfairly framed as a tradeoff between fast, US-dependent, innovation and slow-moving European regulations. Trice has done several advisory and exploratory projects in this space together with authorities, healthcare companies, and players in the defense sector. In this article, we want to share some conclusions in line with our continued focus on hands-on, value-oriented tech strategy.
It is our core belief that Europe has the skills, innovation speed and capacity to build great products across the spectrum from fully embracing the platforms of international tech giants to fully sovereign systems that do not come under any other control than our own.
Data is a key component of any modern, AI driven tech stack
Organizations need clear ownership and control of the data that creates value or helps it serve its users in the best way possible. As traditional industries become data driven this data needs to be made available in centralized systems. For sophisticated analytics, innovative products or productive AI agents.
Typically, a modern data platform consists of large amounts of storage, a catalogue to keep track of and indexing the data in storage and a query engine to be able to manipulate the data. This is combined with scalable compute capacity that the query engine needs to run, scheduling and data pipeline processing to go from raw data to sophisticated data products. It is worth mentioning that many organizations are still working with older analytics and business intelligence solutions, partly because of the challenges outlined in this article.
Existing data platform arose from the needs of tech companies to process enormous amounts of telemetry from users, mobile applications and other sources. They inherited a lot of their functionality and design from big data applications in science. Their evolution has been driven by the need to get all data in one place, allow smart people and later AI systems to look at, and derive insights from, the data.
While the “Big Data” use case is now well served by data platforms, more sophisticated needs around secure data sharing and being able to combine carefully selected datasets for tailored audiences have only recently started to be addressed. As anyone who has struggled with access controls and sharing on any intranet knows, getting the right data to relevant people is not straightforward. Add in a layer of AI agents and it gets even more complicated to understand who can access what.
Turning off the shelf open-source components into a comprehensive data platform offering
As data processing is a shared need across many fields, we are seeing a strong line up of capable open-source projects in this space. They are often backed by big industry players and are already in use in many well-known applications. Based on our research and work with clients we recommend a stack built with Trino as the query engine. Apache Iceberg as the main component for the catalogue and table format, and Prefect as the workflow engine.
While the suggested components are strong building blocks for a full data platform, they do not offer a turn-key solution. From a digital sovereignty perspective, they also do not solve the core problem of having enough pooled storage and compute to be able to scale data processing as needed. European cloud offerings such as evroc are starting to address that and can be combined with well audited compute solutions, e.g. Welkin from Swedish Elastisys. We also track interesting developments in the hybrid and on-prem datacentre space that enable more sophisticated data processing.
To unlock the full value of modern data platforms and AI in more spaces the remaining pieces that need to be added are tailored deployments to suitable environments. Software and to some extent UI that binds the open-source components together and makes it easy to plug in notebooks, AI agents with MCP support and other building blocks that enable analytics driven, AI assisted way of working.
Including the pieces needed for modular, auditable and secure data sharing
To add one more dimension when it comes to applying data platforms to more restricted industries and government activities, we have observed a need for better inclusion of identity, access controls, auditing and security monitoring of data platforms. The big commercial platforms have rapidly improved support in these areas in recent years. Adding security to a platform that was built to openly share all of your “big data” across the organization comes with legacy challenges.
Our current research focus is on this space. Looking at what tools exist and what we can build to enable timely, rule-based access to key pieces of information in near real time. This needs to be done in a way that makes it easy to understand who has access to what data, and for what purpose, and with controls that make auditing and compliance easy and built in.
Wrapping up – the journey to sovereign data processing and AI access
As outlined above; by combining the following components it is possible to build strong, fully self-contained data platforms that can enable any organization to be data driven and use AI on its own terms.
Access to a modern compute and storage platform capable of scaling according to the data processing needs. E.g. a hybrid cloud using a local provider like evroc in Europe.
Infrastructure and deployment capabilities to package, deploy and support existing open-source components that can be combined into a highly capable data platform.
A user-friendly glue layer that makes it easy to work with and understand the platform.
For highly regulated or sensitive applications: components for rule-based access control, automated retention and security monitoring.
Reach out to us if you are interested in getting the full presentation or want do discuss some of these areas in more detail.