A closer view on Data Science Delivery

According to Gartner, only 15% to 20% of data science projects get completed. Of those projects that did complete, CEOs say that only about 8% of them generate value. Despite these facts, data science is still considered as an opportunity for business growth. These facts are something that always lingers in the back of everyone’s involved in the data science project mind. They motivate data scientists but even more delivery managers to improve their work and constantly re examine project delivery and management approaches. What’s behind the high failure rate and how can we change this?

Let’s examine the project flow we like to stick to in Things Solver when working on products and solutions for clients coming from different industries and expecting different business benefits from our deliverables.

At the very beginning of the client-vendor partnership, as we are approached as someone that can make company X become data driven and therefore make smart and automated decisions regarding the maintenance or advancement in never ending market competition game, we firstly try to get accurate ideas about clients strengths, weaknesses, opportunities and threats. Those four dimensions should be examined both through the business and data science lenses. From the half decade experience in vendoring clients with data driven problem solutions, we came across two patterns of early stage partnerships. Company X (client) wants us to provide a data science product/solution because they’ve read/heard about the positive impact that data science can bring to their business or because data science is expected to be the company’s last hope to find additional revenue streams and that way save the company’s future. Here comes two usual scenarios. First, company X has absolutely no idea how it is going to become data driven, what areas of business should be affected by the shift and how it can utilise our deliverables in order to maximise their value and business impact. They just know they want to become a data driven company and be perceived as such. Second, company X has a pretty good idea what it wants from data scientists. They have noticed that their business operations need enhancements in a certain area, they can provide their domain expertise for tackling the problem and data scientists are expected to materialise the solution at the end. While the second scenario sounds better, it doesn’t mean that starting from the first scenario will make the project less successful or fail at the end. 

So, when the client has enthusiasm to start a data science journey but vague ideas how, we like to organise a couple of workshops with business representatives across the company’s departments and talk and listen, A LOT. That way we are able to help the client to come up with contours of their future data strategy. Initially, we like to talk about:

  • Business strategy
  • Company’s portfolio
  • Business operations
  • Problems encountered (strategic and operational)
  • Business plans and goals
  • Important KPIs they track
  • etc.

That way we can identify potential use cases we will work on. Despite the fact that we’ve usually already worked in the same industry and similar use cases this is always a perfect opportunity for us, as a vendor to enhance our business understanding and to keep up with the industry trends. 

When we finish with definition of potential use cases we like to discuss the data itself. Search for answers to questions like following can help us:

  • What is the company’s current data collection and data management strategy? 
  • What infrastructure supports it?
  • Is that flexible? 
  • What data sources are available and to what extent? 
  • What is the data quality (high level assessment)?
  • Is there a possibility to collect additional data (whether it be nice to have or absolutely necessary parameters regarding the identified use cases)?
  • etc.

After we’ve got business understanding and general data understanding, we like to conduct evaluation of potential use cases on a couple of levels. We’d like from our client to get business impact and priority evaluation of use cases while we work on technical complexity and effort needed evaluation. These can help us to separate use cases to four categories:

  1. Quick win 
  2. Strategic
  3. Low value
  4. Items to cut

 

Quick win use cases are generally the best starting option. Those use cases have significant business impact so they are evaluated as high priority cases by the domain experts. Moreover, they do not seem overly complex and as such do not require enormous effort from the data science side. Quick wins are relatively fast and visible project gains. Those can serve as trust builders between vendors and business stakeholders.

By now, we as a vendor and company X as a client have the following: initial business understanding, clear definition of business problems, list of potential use cases and initial data understanding. Next step is to define:

  • What is the use case we will be working on first?
  • Who is the case owner?
  • Who is responsible for timely delivery from our (vendor) side? 
  • Which domain and technical experts will be allocated to the project from the client’s side?
  • When are the internal and external checkpoints and deadlines in the project timeline?
  • etc.

This is the moment when we work on a scoping phase of the project. Scoping phase regarding the chosen use case should start with product/solution requirements coming from the business side. This can sound as following “we want to know what are the main churn drivers and how to prevent customer churn”. Scoping phase also includes initial solution ideation and this usually mostly relies on data scientists. This initial solution ideation means creation of different rough sketches for possible solutions. After this, data engineers (and/or data scientists) approach the phase of data preparation and accessibility. Scoping phase should end with finalized scope and approved KPIs (business and technical) that will be monitored onwards.

Research phase comes right after the scoping phase. This phase deals with data exploration, literature and solution review, technical validity check, research review and scope and KPIs validation. During the research it is important not to lose focus and get paralyzed with possible approaches for finding a solution.

In the development phase, the team works on the experiments framework setup, model development and test and KPIs check. It is important to say that both research and development phases are highly iterative processes and usually take a couple of weeks per iteration.

Finally, in the deployment phase focus is on the productization of the solution and monitoring of the setup. That way we have finished the cycle of delivery and what is left is maintenance of delivered solution and long lasting partnership.

The process of delivery that I have just described is something we try to follow whenever is possible and reasonable but, of course, that doesn’t mean 100% of projects have this exact delivery flow. Having worked on a numerous data science projects so far I would like to point out to some findings or better said lessons learned:

  • There is no one fits all approach and methodology when delivering data science product/solution;
  • Not every use case is destined to be highly successful. Sometimes even not so accurate models will give enough value, decisions that are data driven, decisions that are good enough and fast enough;
  • Not every highly accurate machine learning model will generate expected business value;
  • Not every business problem can/should be solved with some (complex) machine learning models even when the business stakeholders rely all the hopes on it. Sometimes “simple and stupid” solutions are much better option;
  • Sometimes starting with development of a solution for a particular problem will diverge project in unexpectedly different directions. Insights from an initial project will reveal more interesting and/or more urgent problems to be solved;
  • Sometimes business stakeholders will have completely opposite or just different visions for the problem scope and solution than the data team. That is the moment when data scientists should take into account all the information and insights coming from both parties and start to work on the objectively best option.

Cover photo taken from: https://unsplash.com/s/photos/delivery

How we deliver value at Things Solver – Focus on the goal

I didn’t want to start this post by quoting the percentage of data science projects failing to deliver commercial value or business utilization. Many data science projects fail – and that’s a fact. What’s important is to make sure that you are not spending hundreds of Ks of euros/dollars? and other resources and fail.

Being successful at implementing data science, machine learning, data pipelines, data ops, in business is complex. And there are many prerequisites to successfully implementing such a project – ranging from operational and technical in terms of data availability and quality, lack of resources, to strategic and the focus of the organization to implement data products. We can refer to any product that aims to achieve the end goal through the use of data as the data product.

Over the years we had an opportunity to work on many interesting projects across industries. Many of them achieved great success and became industry awarded products (https://thingsolver.com/why-solver-ai-suite/, https://www.lightreading.com/artificial-intelligence-machine-learning/meet-sara-the-ai-that-may-be-a-big-deal-for-austrias-main-telco/d/d-id/755428). And some of them didn’t have the same destiny, and are not commercialized yet.

As time followed, we have tested different approaches for maximizing the outcome of our work, and have our data projects under control. The core value that we nurtured when it comes to implementing data products with our customers is fail-fast: test in early stages whether goals that we set could be achieved with the state of the current data landscape, and focus our energy on more important tasks if not. From the place where I’m currently standing, there is no better way, especially for new areas of analysis and research or companies that are new to the industry.

Throughout the years, I’ve had a chance to learn A LOT about different approaches to delivering data products to different industries – what should work, and what most definitely won’t. 🙂 In the following series, I will share our experiences and approaches (the ones that proved to work for us) that are incorporated in Things Solver DNA of how we deliver value from data to our customers. 

And the first question that comes for any project you are working on – where’s the problem?

A perfect problem to be solved

For any problem you are trying to solve, the crucial is to clearly understand if the problem is worth solving. This doesn’t go only for advanced analytics, it’s important for any task you are trying to accomplish. Doesn’t matter if your solution is the most brilliant in the history of the universe –  unless it solves the problem worth solving, it’s useless. 

Over the years, we played with different methodologies for defining a problem worth solving and identifying key determinators for it. It’s always important to ensure our customers are investing in something that will benefit them significantly. As a step to it, we have established a process of assessing the client’s data strategy as the first step. Throughout this phase, we are working closely with the customer’s team to identify their maturity and data proficiency, and enlist all the potential use cases they can think of – without going into details about how achievable and realistic those are. The main goal is for different departments to get creative, and think deeply about their core business challenges and goals that should be achieved. While working on identifying potential use cases, it is crucial for us to gather stakeholders from various backgrounds – depending on the industry we typically aim to include Business Development, CRM, Marketing and Sales, technical managers, finance,  customer service, and operations managers – depending on the area we are aiming to address. Working with different industries over the years enabled us to understand the main challenges and opportunities for many businesses, which helps us to guide stakeholders during these workshops and help them focus. This also helped us identify a common set of goals companies aim to achieve, and voila – our first product – Solver  AI Suite was born. 

At this stage of finding a perfect problem to solve, we focus mostly on guiding our customers and help them be creative, and not try to impose our own opinions. Customers should be able to focus on what really matters for their business, without us being subjective in terms of the tasks we would very much like to work on or we find them important based on our experience with other customers. 🙂 On the other hand, experience across industries helps a lot in the advisory role.

This part of the process is very creative and fun. But since this is a starting point from where the true value for the business is created, it is crucial for it to be well structured, planned and guided. At this stage we aim to cover the whole business strategy of a company, so it can be quite exhaustive – we typically split it into multiple well structured and prepared workshops. 

How well do you understand what you are trying to achieve?

Once we have a clear list of goals to be achieved and problems to be solved, we go into prioritizing their implementation and making a roadmap for the organization to achieve them – finding the perfect first to start with. Two crucial measures should come out of this process – Impact and Effort.

Impact

How impactful each use case is for the overall strategy and operations of the business? At the end of the day, this is how any project should be measured – how much better my business will be after I implement this? It’s always great if you can focus on things that matter the most, and help achieve goals of the highest impact. In order to measure the impact objectively, it’s important to have different teams included – our goal is to see how impactful resolving each problem is for the whole organization. 

Our 2nd goal for this step is identifying how well aligned are different business units in terms of overall company goals and who are the key stakeholders in the process. With that in mind, we can understand a problem more clearly, but also how different teams communicate related to different processes and daily operations, and what’s their role. It’s also a safeguard not to miss to include all the relevant stakeholders in the process.

Effort

How much effort and resources are needed to achieve these goals? Prioritization is important in terms that we want to always focus on topics that matter the most. But effort estimation comes in neatly to ensure that we can deliver value without spending months or years while working on a single use case – important tasks that require less effort in most cases are our priority. 

In this phase, it’s very important to take into consideration the existing data landscape in the organization – data points that exist, their quality and availability, internal and external data management policies, etc. Many years of experience that we have working with different industries on different use cases, enables us to more objectively look at the complexity of dozens of use cases, and in cooperation with the client identify which goals can be achieved in the short term, and which use cases require many stages to enable their implementation.  

What we always advise our customers is to focus on challenges that bring a significant impact to the business without too much effort. This is especially important for organizations that are new to advanced analytics, and are just starting to implement use cases in this domain. With this approach, we aim to make the whole organization a believer in machine learning – we aim to get impactful results really fast so that with the key stakeholders, we can get the needed support from the whole company to invest their effort and resources in this area. Low hanging fruit is generally a good idea for newcomers to aim for when just starting experimenting in machine learning and AI. It helps to understand the process, the way of work, and the methodology, and prepare the team for bigger challenges. 

Identifying effort is more challenging than identifying impact in most cases – there are many variables that need to be taken into account, and in many cases it requires a few iterations from both sides – our and customer’s – to determine it. We aim to measure factors like data availability and quality, our and client’s previous experience in the area, similar or planned projects in the area, the complexity  of the analysis itself, … Crucial is to oversee which blockers may exist and how we can overcome them. Usually it may arise that the data needed for such analysis is either not available or not of required quality, so we have to make a plan of how we can provide it. 

Estimating any task is very complex, and additionally with data science we are never able to tell whether something will work 100% – at the end it depends on the data and processes and behaviors that data describes. Over or underestimating shouldn’t be a dealbreaker if we have a process under control and we are able to adapt  to changes and new facts.

Roadmap 

Once we are able to estimate the effort required per each use case, we can proceed to building a roadmap to achieving all the identified goals. As the outcome, we want to determine different sets of use cases:

  • Quick Wins: problems that are impactful and require less effort – low hanging fruit
  • Strategic: problems that are impactful but require significant effort – usually require few iterations
  • Parked: problems of lower impact but with higher effort –  areas where we currently shouldn’t focus our energy
  • Qualifiers: problems of lower impact and with lower effort – topics that can be addressed quickly, but will bring modest impact.

We are aiming to prepare a roadmap to achieve all of the identified goals (please note – a roadmap, not a plan). What’s truly important for any organization is to revise their use cases, impact, effort, and the roadmap frequently – at  least once a quarter – since business landscape, strategy and main goals may change. 

What’s our scope?

Once we are satisfied with the results of our ideation, we go to the last stage in Problem Definition – defining a scope. Scoping has to start with specific business needs and objectives from the list of use cases – which we can measure. A primary question we should ask ourselves is which actionable insights we are getting as the outcome, and how do they fit the current business landscape. 

Any scope should cover two really important dimensions:

  • Product: what should be the output of the project and how it will be used on a daily basis?
  • Metrics: how do we measure that we have achieved a goal?

It’s a good practice to try to summarize the scope in a single specific sentence – although everyone is aware of the multiple milestones, subgoals and touchpoints to be made down the road – another addition to a clearly defined scope.

As someone who is in charge of achieving these objectives, we have to focus on the data points we need and the analysis and methodologies we aim to perform. 

By taking into account data points, product requirements, relevant business processes and work required to be performed for the planned scope, we are able to identify which skills and roles are needed to accomplish the goals – data engineers, data scientists, UI/UX designers, frontend and backend developers, data ops to manage pipelines, etc. 

The whole process of defining and scoping a perfect data science problem should be heavily data informed to define a realistic project scope and set realistic expectations. 

For us internally at Things Solver, the whole process helps us to determine our offering for achieving the goals – do and to what extent our products and services fit into the landscape – at the end of the day are we a good match for achieving them.

After going through the whole process of brainstorming, collecting data, revising business plans and strategies, teams and processes, use cases and requirements, heavily communicating inside and across of the teams, integrations and flows, defining milestones and subgoals, … – we get The Scope. The Scope determines the start, the flow and the destiny of the project. 

This is where our project starts. 🙂

 

Photo credits: https://www.pexels.com/photo/black-and-white-dartboard-1552617/