Photo from orbit of the earth at night light up by lights

Where is my data anyway? Data sovereignty challenges in a world of SaaS and Cloud

'Where is my data?' is a tricky question to answer succinctly in a modern world dominated by cloud services, APIs and globally distributed Content Delivery Networks (CDNs). In the good old days when you just stored everything in your monolithic CMS and hosted it yourself, it was easy to answer. Nowadays, we may need to rephrase the question, or ask different questions entirely.

Picture of Luminary CTO Andy smiling with a black background

By Andy Thompson, 3 June 20227 minute read

The most succinct definition I think I've heard of "The Cloud" is simply "someone else's computer". We live in a modern digital world increasingly dominated by Software-as-a-Service (SaaS) and Cloud services, while also becoming more aware of and concerned by privacy considerations and regulations.

Data residency refers simply to where data is hosted, however data sovereignty is a little more involved, as it deals with which laws and regulations apply to the data because of where it resides. While this has been of concern to a number of industries such as finance, charities and NFPs, e-commerce or member organisations for some time, it became even more complicated and a concern for all industries with the 2018 introduction of the General Data Protection Regulation (GDPR) in Europe, described as 'the toughest privacy and security law in the world', with many other jurisdictions worldwide looking to follow suit.

There are plenty of resources online explaining GDPR or CCPA (the Californian equivalent), including my own discussion of what GDPR's introduction means for Australians (posted a few years ago now!). I won't go into detail on them in this post.

Ten years ago in the good ol' days, the solution to data sovereignty concerns was simple - host everything in-house, and control it tightly. All-in-one Digital Experience Platforms (DXP) such as Kentico Xperience, Optimizely, or Sitecore, followed that trend and became extremely popular with enterprise organisations with stringent data security policies. 

Now, as cloud providers, Software-as-a-Service (SaaS), Composable DXPs and technologies such as the Jamstack provide huge benefits in the areas of security, reliability and scalability, the situation has reversed, and enterprises are typically pushing everything off-premises, and into the cloud.

A few SaaS products, such as headless CMS Kontent by Kentico, address some of these concerns by giving customers the option of which region, or continent, they would like their data to reside in.

But noawadays, how do you keep track of it all, now that your data is almost certainly integrated with and spread across any number of global, cloud-hosted digital services?

There's data, and there's data

The entire World Wide Web is just data. Frankly, some of it needs to be protected, and some of it doesn't. Let's look at the different kinds of data. I apologise if this is not a technically brilliant deep dive into the different types of data and their legalities, this is a brief summary to help me get to my point in the next section! Think of data broadly as:

Personally Identifiable Information (PII)

This is the data that causes a lot of concern, because it's a) sensitive, and b) tightly controlled by laws in many regions.

The Australian Government defines PII as 'Information that can be used on its own or with other information to identify, contact or locate a single person, or to identify an individual in context.'

Examples include names, email addresses, account numbers, physical addresses, birth dates, phone numbers, photographs, or biometric identifiers (such as fingerprints).

Of course, you can process and publish PII with permission. For example, this website is littered with team members' names, qualifications, locations, and photographs. The email addresses of people who subscribe to our newsletter however, must be protected.

Anonymous/non-identifiable user data

This refers to data that is collected about people, and processed to remove any of the PII listed above. For example, your Google Analytics summary statistics showing the number of page views over a period of time, or even showing that there was a single user currently sitting in your checkout having not purchased yet (as long as you can't tell who it is).

It's important to remember that the definition of PII above includes "or with other information", so if this information could potentially be the missing piece in a puzzle to identify someone, then it should be treated like PII. But if it's impossible to identify people using this information, then it does not require the same controls as PII.

Public/marketing content

In most cases (such as ours), this is the majority of the data you're dealing with for a digital project such as a large website, and in some cases it's the only type of data stored, processed and published by many of the systems in your solution architecture. For example, you don't need to be concerned about the data sovereignty of your own blog posts that you intend to share with as many people as possible worldwide.

If in doubt

If something is in the grey area between two categories, you're best to err on the side of caution and treat it like it's more sensitive. For example, if you have data that you think is anonymised but you're not sure if it could be used to identify an individual (e.g. IP addresses or device fingerprints), maybe just roll with it like it's PII if you can.

Data processing versus publishing

It's important to differentiate between storing and processing sensitive data, versus publishing data. How it usually works in the real world:

  • Data processing of sensitive and public data happens in controlled environment such as your CMS, CRM, or email marketing platform. This system is likely to have a range of privacy features, potentially specifically targeted at GDPR or CCPA regulations.
  • Publishing and distribution of data, such as web pages or emails, must be (by definition) distributed outside your controlled environment. It's impossible to know how many 'hops' that content will make between your systems and its destination, and how it will be used when it gets there.
  • JAMstack, MACH, or indeed any modern API- or service-driven architecture, allows you to draw a neat and clear line between systems with clearly defined functions, and determine whether these systems are storing or processing senstive information (such as your CRM) or not (such as your CDN).

It's been a very common practice for years to lock down your whole CMS or DXP in one tightly controlled hosting environment, and then using a global CDN to distribute the rendered pages. 

Nowadays, we need to apply the same thinking, and tightly control those systems that do contain sensitive info, and look at the links between systems, rather than try to apply blanket policies to every link in the chain.

So rather than asking 'where is my solution hosted', rather ask 'where is my sensitive data being processed?'

Let's look at an example

Andy's Gadgets has a public-facing website that accepts form submissions including customer's personal information, and which also integrates with an online store with all the regular features - shopping cart, checkout, customer accounts, order history, tracking.

Content Management - Kontent by Kentico

Kontent deals with draft and published website content, integrates with various third-party services to aid with content generation, and has the option to be hosted in Australian data centres.

A JAMstack architecture is used, meaning the headless CMS is only used to provide content to DevOps tools which build the public site - it is not directly accessible to the public in any way and access is tightly controlled.

Online Forms - Netlify Forms

Form are built to submit directly to a secured third-party forms endpoint via the Netlify Forms microservice. This is the only service that processes forms submissions, and integrations are managed between it and other platforms such as CRM.

DevOps, build & hosting - Netlify

Continuous Integration (building the site from code) and hosting Netlify which only accesses published content from the CMS via Kontent's public Delivery API, so there is no PII involved.

Online store - Shopify Plus

Published product information is provided by a public API to custom integration elements in the Kontent CMS and the website build process in Netlify. No PII is included in any of this information. 

The cart, checkout and payment gateways are all hosted on and accessed via your commerce service directly, at a shop.andysgadgets.com subdomain.

Content Delivery Network (CDN) - CloudFlare

Cloudflare distributes and caches purely published web pages, images and other assets around the globe. The only information processed and published by CloudFlare has already been published publicly on the website, so no PII is involved.

Analysing this example

The PII data in forms is only flowing from end users, through their browser, directly to the Netlify Forms service. It doesn't touch the CMS, or the build/hosting platform, or the global CDN. You only need to worry about one processor for that set of data, and you are completely in charge of that decision. If you're worried about data sovereignty with Netlify, you only need to look for another forms service, such as Paperform which is Australian.

Similarly, while customers can be browsing public gadget product information on your website, any PII when they get to the stage of purchasing is only flowing directly from your visitors to your commerce service (in this case, Shopify Plus), and not via any of the other services in your architecture, so it is the only service you need to scrutinise.

Where is my sensitive data being processed

In this example, it's just the forms and e-commerce services that you need to worry about. The rest of your architecture is only dealing with public or anonymised data. So focus your attention on those two services when trying to determine whether the architecture satisfies your data sovereignty requirements.

For the rest, go with the best!

Our Jamstack experts

You can be confident that you are in safe hands with Luminary. Meet some of our experienced Jamstack practitioners.

Keep Reading

Want more? Here are some other blog posts you might be interested in.