Implementing a new data science and analytics platform Part 2

Recap.

We went through a non-exhaustive list of requirements for a good data platform, that we used to shortlist two solutions for a POC: Databricks and AWS Sagemaker.

In the part 1, I introduced our journey towards the implementation of a Data Science and Analytics platform. I explained that a data driven company needs to consider many aspects, from hiring good talent to investing in a new data platform.

Databricks is a software platform that helps its customers unify their analytics across the business, data science, and data engineering. It provides a Unified Analytics Platform for data science teams to collaborate with data engineering and lines of business to build data products. Find more details here.

Amazon SageMaker is a fully managed machine learning service. With SageMaker, data scientists and developers can quickly and easily build and train machine learning models, and then directly deploy them into a production-ready hosted environments. Find more details here. This was already available internally, as CTM uses AWS as cloud solution.

Photo by Austin Distel on Unsplash

POC ran for a duration of a month, during which we assessed the functionalities of both solutions, and validated against each other as well as the current environment where relevant.

Methodology

The POC was divided into 2 main parts:

Architecture & devops assessment.

End to end testing.

Architecture & devops assessment

In this part, the focus was on the platform deployment and administration. We created an isolated AWS account, identical to the main account we use for our daily tasks. We then went ahead with the deployment of Databricks, which we found straightforward. The tests were evaluated against the following categories:

Deployment: How easy it is to deploy Databricks within AWS.

Administration: What features are available for the platform admin, how effective they are.

Tools & Features: Are the available tools capable of covering all our daily tasks.

Performance: Query performance, job performance.

Integration with external services.

End-to-end testing

This is where the solution has been tested in much finer detail, developing, and productionising a Machine Learning model with Databricks and Sagemaker.

Since Databricks casts a wider net than just Machine Learning applications, we have arranged for a 2-day Hackathon that involved various teams within the Data Function to go over a scripted task list, predefined by the representatives of each team (insights, Analytics, Data Science, etc).

This part was evaluated using a scorecard that rolled up into various categories such as:

Photo by JESHOOTS.COM on Unsplash

Productivity & Workspace: ease of use, platform performance, stability of the environment.

Collaboration: Collaboration with other users, sharing results and dashboards.

Analytics: Data manipulation, visualisation, and data export.

Data Science: machine learning lifecycle management.

Note: List above is not an exhaustive one, just a high-level overview

Each team member was expected to score various tasks under each category. These were then discussed, to understand the reasons behind them and averaged where relevant to get an idea of which solution was preferred by the team. (Databricks, Sagemaker or current way).

Implementing a new data science and analytics platform

How do you choose a strong solution for your business? Which platforms are best? There’s lots to consider:

Be more data driven! This is a sentence we hear more and more. The boom of big data technologies have opened the doors to possibilities we could never have imagined before. The affordability of these solutions makes advanced analytics and data science available to all.

However, having a strong data science and analytics foundation requires a lot of aspects to be taken into consideration and investments to be made:

Hire new talent.

Review the internal technology stack and potentially invest in new technologies.

Put in place a proper governance around data related activities.

Compare the Market started this journey many years ago, by hiring data specialists (data engineers, data analysts, data scientists etc) and implementing new infrastructures (Hadoop initially).

This has led to the rapid growth of our data activities, with many positive results (massive processing of unstructured data, machine learning at scale).

However, a few years ago we decided to build on this by implementing a unified data science and analytics platform as this was easier to maintain, more cost effective and flexible for the work we were doing.

This was a long project, but after 10 months of work, it is now complete!

Over the course of a series of blog posts, I’ll share our learnings from the implementation.

I’ll cover:

Initial decisions.

Proof Of Concept.

Architecture design.

Implementation & Onboarding.

Note: What we are sharing in this blog is not THE way to implement a data science and analytics platform, but a solution that was fitting our context.

The problem

Although the business had invested in many data tools, there was no enterprise platform in place for delivering advanced analytics and data science. Given the growing size of the team, and the amount of incoming projects, we decided to look for a solution to enable collaboration between the team members, and allow to ingest new projects.

Photo by airfocus on Unsplash

A few of our requirements were:

Scalability: Not just kit wise but also scalability of the team to take on more projects including upskilling, onboarding, collaborating etc.

End-to-end functionality: Ability to tackle various tasks on the platform, through standardised methods/kit, without depending on external resources.

Collaboration: Facilitate joint work on a project.

Skills gap: A platform providing skills stacks able to cover the major roles (from data analytics to data science to machine learning engineering).

PII / Sensitive Data controls: Meet the data governance and security requirements.

As a result, we have reviewed and engaged several vendors to explore market offerings and find a suitable partner to help us deliver the new platform.

We have done extensive research on the vendors listed in Gartner’s Latest Magic Quandrant for Data Science and ML Platforms and added some others we have interacted with during the last couple of years.

Some of the leaders in the magic quadrant were ruled out due to:

Op model: Proprietary software/licensing with high pricing, vendor lock-in and/or skew to on-premises infrastructure that would be inflexible to changes in our internal infrastructure.

Performance on key capabilities such as collaboration, advanced analytics & ML Ops and scalability.

Finally we landed on two options to run a POC:

AWS Sagemaker: Already available solution as CTM uses AWS as cloud solution.

Databricks: For the list of features it provides.

The POC ran for one month during which we have developed and tested functionalities available on the platforms. This is the topic of the next series, where we will see how we organised the POC to be sure to have an objective result and make an informed decision on the way forward.

Breaking a bias shouldn’t be an afterthought

At Compare the Market, we are proud of every one of our employees for the value they add to our business and working with bias in our teams is not something we will tolerate. With International Women’s Day in mind, we caught up with Dimple Dalby, one of our amazing Engineering Managers, about her experience of bias in her career and what advice she would give to the next female generation to keep stamping this out in the workplace.

What does International Women’s Day mean to you?

It’s a day to celebrate how far we have come, everything we have accomplished and to remind ourselves that the journey is not over! There is still more work to do so that the younger generation, and the ones after that, have a better and unbiased world to step into.

What does bias mean to you?

Bias to me is simply discrimination based on factors such as gender, race, age and ethnicity. It happens when people are not ready to accept anything — ideas, thoughts or actions — that is outside of the norm for what they are used to. And for us as women, we have centuries of biases to break and now is our chance to set up a new and equal norm for ourselves where we challenge the everyday inequalities.

Have you come across bias in your career?

Yes, of course… many times!

And while sometimes it has happened directly to me, I have also been a witness to it with people I have worked with. Either way, it’s not a nice place to be and can make you feel angry, inadequate and in the worst case, helpless.

The two areas where I have come across a strong bias in my career are:

Promotion to a leadership position.

Obtaining equal pay.

There were a few years in my career where I felt I had hit a brick wall as I would get promoted to a Senior Engineer but got constantly knocked back when applying for a lead role.

And this happened despite me leading many major projects — when it came to officially being given the role, I wasn’t even considered. I was never offered a solid reason why, just told that I was not ready and that it takes women many more years to gain the confidence to be able to lead compared to men. I wasn’t offered a path to train myself to be ready either.

It wasn’t until I joined Compare the Market that I got my first break into a lead role. Not only was I trusted to be the lead of a very important team, I was also given six months of leadership training to help me grow in my role and fill any gaps I may have had.

And in addition to this, I had many senior leads reach out to me offering support so while the journey was not always an easy one, I always felt that any help I needed was just an ask away.

How have you challenged that bias?

Challenging didn’t always come naturally to me and because some of these biases are so deeply ingrained in our society, challenging them can feel daunting.

But when I did finally have the courage to challenge, it wasn’t always handled productively or professionally to begin with.

Being overlooked in your career, based purely on your gender, can make you feel angry and anger can lead to a lack of clarity in your thoughts followed by unprofessional reactions that you may later regret.

When faced with bias at work, whether it was finding out that all my male colleagues were paid far more than me or being unfairly overlooked for a promotion, I have always tried to step outside the situation and analyse it.

Often my first reaction has been to feel angry which I think is ok and justified. But to channel it well, I wrote down my thoughts about the situation and how I would like to respond to avoid explosive or unprofessional reactions.

A few top tips that I have learnt along the way are:

  • Good and effective communication is key when tackling a sensitive issue.
  • Be direct and never be afraid to speak your truth and voice your concern.
  • Keep a record of the communications you have had.
  • Agree on an action plan with all parties involved.

In situations where it repeats itself and you feel it’s a much bigger situation than you can handle yourself, get the HR involved.

What advice would you give to the next female generation to help break the bias?

You are not alone in this even though it might often feel like that when you are a minority.

Don’t feel afraid to challenge the biases that you face as that is the first step towards change.

Only when enough of us challenge can it get highlighted and get the traction that it deserves. Being subjected to bias can make you feel under-valued, leaving you questioning your own confidence.

Don’t let this deter you from what we are all looking to achieve #BreakTheBias.

What advice would you give to employers?

Breaking biases should not be an afterthought.

Introducing unconscious bias training is only the beginning. This needs to be followed up with measuring the impact and outcomes that it has on employees to form the basis for longer-term planning and investment.

And as with everything, transparency is key. Always remain open and honest about criteria relating to areas such as internal promotions and hiring for new roles.