In plain English: What does a data architect do?
By Venkatesh Neldurg, Senior architect, MiQ
A blog series explaining some of the concepts, processes and technologies we need to do our jobs – in plain English.
Here’s how I like to think of it.
If data scientists are the people in lab coats, running experiments and finding exciting discoveries, we data architects are the ones who build their laboratories.
But data scientists don’t need test tubes and chemicals. They need:
- Access to the right data, organised in the right way
- Robust systems they can use to test and experiment with that data
- And, most importantly, security and compliance infrastructure so they always know they’re working in a way that’s safe and legal.
Building that environment and getting all those things in place is what we data architects do every day.
And, in the world of programmatic media, our job is especially complicated.
Data architects in other industries manage data systems that can be vast and complex, but are comparatively slow moving. In programmatic media, the scale of the data is unparalleled (we process two terabytes of data per day), and it has an extremely short shelf-life. A lot of the data we use will only be useful for a maximum of seven days. Some of it – like real-time weather data or travel information – can be out-of-date in seconds.
Joining the data together
As a data architect, my first question is always: what client challenge are we solving and what data do we need to do that?
The answer will almost always be a mix of first-party data (the data a company has on its clients and potential clients), second-party data (owned by, maybe, a media agency who’s run campaigns for the client before), and third-party data – information from the sources we think will give us insights into the target audience group, things like online behaviour, location, income range and so on.
All that data needs ‘cleaning’, which means filling in any gaps in our knowledge using predictive modelling. So, for instance, we might know from the first-party data that a user is interested in a product the client offers, but we might not know what region they live in. By mapping that against third-party location data, we can fill in the blanks and identify the audience in a certain area.
Building data lakes
The next question is where can we work with this data?
To analyse data, it all needs to be in one place. And we call that place a data lake. Our job as data architects is to pool all the first, second and third-party data in a data lake so that our data scientists and analysts can go fishing for insights.
But the location of this data lake is another challenge for an architect to solve. We might ingest the data to one of our own cloud platforms or work within a client’s data centre, depending on the security requirements, the tools we want to use for data analysis and the volume of data in question.
The transfer of different types of data to different locations, so that it’s accessible to the people who need to work with it and always secure is a huge part of our role as data architects.
Security as a priority
Everyone understands the need for data security. Obviously, the risk to businesses of misusing or mishandling data are enormous. But really it’s about doing what’s right for consumers. The more they trust that their data will be used securely and in ways they have explicitly approved, the better the advertising landscape will be.
It’s a large part of a data architect’s job to build and maintain the systems that provide that trust.
So, when we’re dealing with a large quantity of consumer data, we need to separate data from opted-in consumers (that we can use for more personalised targeting) and non-opted in data which we can only use to gather aggregated insights (ie when we build a picture of the way certain types of consumer behave, but without looking at an individual consumer specifically).
Every data lake we build has layers of security on it, so the people using the data to find insights and run campaigns can only access and use what they’re allowed to. So, our traders in the UK may be allowed to access different things from our traders in the US according to different legislation.
We also need to keep a track of everyone within our organisation who has seen or used what data, so that we’re able to answer any questions, should any compliance issues arise.
What makes a good data architect?
To be a good data architect, you need a mix of qualities. The first is to find it really easy to see the big picture when it comes to projects, but also to have an extreme attention to detail. We have to keep on top of everything, and the security and usability of every single piece of data matters – but you can’t get lost in that detail. You have to always keep in mind the goal we’re trying to achieve.
Data architecture is also a highly technical job. You have to be up to speed with a huge range of technologies. I think you can only be really good at it if you have a natural passion for finding out more about how these tools work and what they do – like, reading about them in your spare time levels of geekiness.
The last thing I think is really important is having a head for figures. Much like a regular architect, there are always a number of options in how you do things, each with their own price implications. When it comes to storing, managing, cleaning and moving data, we always have to be aware of budgets, making sure we’re doing things in the most efficient way, while maintaining the highest quality, especially when it comes to security.