Be one step ahead: Solver AI Suite short overview

Intro and motivation

In the beginning, it started as the three separate projects. The first one for managing machine learning models known as MoMa, the second one for forecasting using multiple models to give the best results (Fibi), and the third one was a business solution for segmentation and recommender system with personalized view and basic campaigning functionality aka Coeus.

As time went on and more business cases came our way we started to expend Coeus with the new functionality. It continued with it needing the functionality of the other two solutions so merged them all.

Since it was too hard to pronounce the name and not all clients needed every feature it had – that was the trigger for us to rethink and rebrand our product, along with the decision that it should be a full-on modular platform that can suit every client. Therefore, Solver AI Suite was born.

Modularity and Pillars

In the beginning, we focused on creating the base of our platform which enabled us to put building blocks on top of it which were made of lots and lots of independent machine learning models and services.

Every instance of the platform needs to have the base of the platform which we call the Solver Foundation that contains its API for integration tools and communication with the building blocks on top of it.

Modules can be deployed independently, and clients choose which module they want to purchase. Although to make things easier for our clients to match wanted functionality we created a set of modules as suggestions.

On top of every set of modules, we added another layer of API to make all the data easier to reach for business users and full solutions integration. And it’s called Business API

Graphical user interface comes on top of every Businesses API to create full pillars of functionality for solving different Business Cases.

Note that both Business API and Apps are made with modularity in mind so clients don’t need to use our suggestions at all.

Business Use Cases

After presenting our idea to old and new clients positive feedback came our way, and things that the platform didn’t solve at the time were added to our roadmap by priority which we use to expand it to this day.

 Some of our pillars are:

  • Solver AI Studio – Enables training, evaluating, and pushing ML models to production. It also tracks the models versioning for the client.
  • Forecast Studio – Lets customer run forecast models on any dataset that they add
  • Solver Smart Segmentation – Enables users to have a complete overview of all segmentation models for all customers
  • Campaigning – Puts the power of all ML models and communication channel integrations into platform users hands so that they can target a specific set of customers with ease 
  • Solver Virtual Buyer – Enables users to meet their unknown customers needs and better understand their needs
  • Solver Personalize – All the data about each customer in one place
  • Solver Anomaly Detection – Users can know all about increased sales and hard falls right when they happen so that they can understand them better
  • Solver Process Miner – Helps you find out how your customers go through defined processes when interacting with your company
  • Solver Power Leads – Track customer behavior when they search for their wanted products
  • Solver Product Analysis – Analyze your products to improve your offer
  • Reporting – See the report of how your team uses the Platform and generate different reports

Architecture and Security

Every module in the Solver AI Suite is created as a micro-service being deployed to Kubernetes clusters and wrapped with Istio service mesh. Using that combination let’s us deploy our platform to every public cloud that our clients decide to use. 

Istio forces mTLS communication between services using sidecar containers which requires two ways authentication for data to transfer between them.

On top of that sits Istio Gateway that enables us to automatically expose newly deployed services to our clients.

CI/CD process is implemented by the book – Github Actions that run tests after pushing the image to the registry has saved us a lot of time doing the same old boring things manually.

On the service level we use gRPC for internal communication and binary data serialization which makes internal data transfers more than 10x faster than using RestAPI and since it is binary that needs a .proto file to be decoded it adds another level of security. External communication is done with FastAPI and if a database is needed every service has its own for security reasons also.

Data Management

To be able to integrate with different customers across different industries we needed to create our data models that can feed ML algorithms with the data that it needs. 

And since we use micro-services with modern technologies all our integration work goes to this part. We make sure that the data is relevant, clean, and usable before we start feeding it to the ml models.

APIs

Internally we strongly believe in API first approach since that is the part of the platform that both our developers and clients use to communicate with it. Every module is designed in advance using the Open API 3.0 standard that will meet the demands of every system that it needs to integrate to. 

Along with that, we noticed two types of API users that will consume the data: deep integration users and business users. As their names suggest they will have a different approach to consuming, the first ones want to go as low as possible and will probably want to tweak everything to suit their needs and the second ones want to come to the relevant information as fast as possible.

So to satisfy them both we created Foundation API which is used for communication on the base and module level and Business API that will call several modules at the same time to extract relevant information for each business case.

GUI

Same as other building blocks on the platform GUI is made of several Apps that are joined together by the central hub called Solver Portal.

Every App created as a set of widgets with a single purpose that matches one Business API call. We also enable our users to rearrange them as they see fit and we store settings for each user so that they can be more productive every time when they use the app. On the left side, there is a familiar section with tabs and main settings to make users feel at home when switching from one app to the other.

The graphical user interface follows material design with our internally made touches to differentiate it from loads of web applications in today’s market.

Along with Apps users can use monitoring modules, settings, API documentation.

Next Steps

Along with expanding our catalog of business solutions using ml models and neural networks we see the opportunity to split GUI using micro-frontend architecture which will enable users to move widgets across different apps which will give our customers full control to create the single best overview of their business solution. With that comes the option to let them integrate single widgets to their existing software sets.

Conclusion

There is no one size fits all in the business world the same as in everything else, so our approach has helped us save the time and money customizing solutions on a per client basis. In the next text, I will go deeper on how we serve ML models and how we manage their lifecycle.

Friday talks: EDA done right

Main challenges 

Although EDA is often observed as an initial step which should be straightforward, there are some challenges that could slow down and make this process poor and painful. Some of the challenges I have encountered so far are listed below. 

Poorly defined business problem (and not having the understanding of it). Not having a clear problem that should be solved can make you wander around without some specific goal, which can be positive and productive, but in most cases – you will feel lost and wouldn’t know what to do with all the data you have in your hands. On the other side, if you don’t understand what the main issues the business is facing are – you will have troubles extracting insights that are helpful, since you will focus in the wrong direction. 

Not having the right data (nor talking to the right person). Although the problem is defined and well-understood, not identifying the right datasets that should be used, or not having the chance to talk to the person which knows the data in detail could make the EDA a hell of a ride. Neither you, nor the client will benefit and be satisfied with the EDA results – and that is not what you want to obtain with this process. Make sure you have the right data, and you have the right “go-to” person, for every question related to domain clarification, data gathering and merging, etc. 

Messy data and (no) warehouse (causing defending attitude of the “go-to” person). In most cases the data will be messy. Foreign keys mismatches, no IDs to join the information from multiple sources on, wrong calculations, etc, etc. Sometimes when you try to merge some datasets, and find out there are differences in IDs, or duplicates, or something else, and you got to the person being in charge for data maintenance – that person may go rogue. They are focused on explaining the reasons of mismatch and mess, and not on  giving the directions on how to make things right – or even do it. Be clear with what you want to do – you want to clean up your data (and get help to do that, if needed) in order to present how data science could help in leveraging some process, not to point out the messiness and neglection of people in charge of data maintenance. 

EDA done on auto-pilot (reports being containers, not insights treasury). Sometimes the problem is that  EDA is found boring and oversimplified. It is done just to follow some defined flow, in order to say that  you have done it, and then jump straight into sophisticated and complex ML algorithms. Most of the problems can be solved in the early stages of EDA – it is not easy, though, but if done right – you’re halfway there. Next time you’re doing EDA – rethink your approach, in order to identify if you are skipping steps and doing it with half a brain, just because you find ML more interesting (which IMO is  unacceptable, thorough EDA and data understanding are prerequisites of  ML application).

Not having a big picture. Remember  what is the main purpose of EDA, and the goals you want to achieve through it. Not knowing why you do something will suppress your creativity, innovation and critical thinking. This results in extracting one-time insights. EDA per se is allowed, but it is more applicable and useful if you do it in order to facilitate future analysis and steps that will be taken. 

How-to EDA

In order to make this process understandable, I have tried to present some main steps and guidelines found in the image that follows (from the client-vendor perspective, but if otherwise, an analogical approach could be applied).

Business problem definition

If you want your EDA to make sense and have a purpose, start with the problem. In this step, the most important thing is to listen to what the client is saying. It often happens that they know which data is useful, but don’t have the expertise to utilize it. On the other side, maybe they  have tried to perform the analysis and solve the problem  by  using data –  manually, and your job is to help them speed up the process. In some cases, it may even happen that they have never joined information from different departments, and don’t have the overview. Many different scenarios could happen, and that is why it is important for you to listen and not make any preassumptions. Translating into an analytics problem means understanding if and how data analysis can help in solving the issue. Definitining main pillars of analysis stands for identifying perspectives of analysis that could be applied – what are the main entities/business  areas that could be analyzed and how they are connected. The main output of this step is to come up with the problem that should be better understood and finally – solved. 

Lessons learned: don’t make preassumptions and let the client communicate the biggest issues. 

Data sources identification

Sometimes, there can be hundreds of input sources coming from various systems and placed in different locations – the goal of this step is to identify which sources contain data that best describes the problem you want to model and solve. Not all sources are (equally) important. IMO, it is better to start small – filter some representative input dataset contained from a couple of different sources to perform tailored analysis, than to have a vast amount of (not investigated) data, not knowing where exactly to start from. Having big data can be good, it could help in having the data describing different areas of business, but at the same time – can be your worst enemy if you don’t have a focus nor know how to filter information needed at a time.

Lessons learned: don’t start with tens or hundreds of tables not knowing how to join them, or filter relevant information.

Set EDA baseline

Okay, to set one thing straight – doing EDA just to be compliant with some methodology sucks. EDA is the main prerequisite for a fruitful and successful analysis, based on data, statistics and machine learning. Doing EDA without purpose or clearly defined goals will make it painful, useless, and overwhelming. There are crucial points to be defined as a baseline for doing EDA:

  • defining a business problem (e.g. high churn rate, or poor CAPEX planning)
  • defining purpose of the analysis (e.g. getting familiar with the data, main relationships among data, understand predictive power and quality for future analysis)
  • defining goals of the analysis (e.g. extract insights from the data describing the most affected pillar within the business, possible directions of improvements etc.)
  • defining working infrastructure (e.g. sometimes initial  dataset has millions of records, which requires working environment ensuring that data manipulation is possible and does not take a lifetime)
  • defining stakeholders – main people that should be involved (go-to person(s) for data, and key people that could gain insights and benefits from the analysis)

Lessons learned: make sure you have all the prerequisites satisfied – business problem, purpose and goals, working infrastructure and stakeholders

Perform EDA

Be creative and utilize everything you picked up from the first step – business problem definition. Think about everything you have learned so far, from your own experience. Use analogy – although there are different businesses with their own functioning mechanisms, it often happens that some analysis you have performed in one use case, can be applied to another.

There are two main purposes of exploratory data analysis:

  • getting to know the data, understand the business through the  data and gain an impression on how this data can be used to take advantage of
  • present insights that should either confirm or refute the current business performance belief and reflect a story on how this data can be used as a baseline of creating a sophisticated solution that would leverage the operational and strategic processes

In order to do that, one has to understand that although, for example, extracting correlations and visualizations are a must-have, and a helpful tool – they are not to be analyzed by the clients. You are creating the analysis for yourself, but in order to tell a story (to the client) based on that analysis. A report is not only a container with tables and graphs, but a utility guiding the reader and telling a story which reveals insights, irregularities and directions for improvements, characterizing the use case (business problem) being defined. So – next time you create an EDA report, ask yourself – what is the value of this report? It is useless if you don’t have a basic understanding on how and why you have done it. 

Lessons learned: create a story that will guide the reader/listener through the analysis, from the problem  setup, to the methodology and finally – insights. 

Present EDA Insights

This is your moment to shine. When you present EDA insights – you have to make a point. Why is it useful, what are new learnings obtained – how that can be used for future analysis and modeling. In most cases, some things that are weird or unexpected to you – are a completely regular thing to the clients, since they know much more about their business. And sometimes it happens the opposite. The idea is to use EDA as a guideline for defining next actions and use case realization. Collect feedback on the analysis and insights presented – sometimes some enrichments, further data cleaning and modifications should be introduced. 

Lessons learned: make a point (or multiple ones) with your analysis, and collect feedback on the analysis you have performed. 

The real deal – referring materials

I have found this comprehensive list of automated EDA libraries, and have used some of them myself (all time favs: Pandas-profiling, Sweetviz, and Yellowbrick). Additional links can be found in the following list:

  1. One from my early days: https://www.analyticsvidhya.com/blog/2016/01/guide-data-exploration/
  2. Priceless list of questions to ask prior and within EDA: https://levelup.gitconnected.com/cozy-up-with-your-data-6aedfb651172
  3. Feel the power of  Sweetviz: https://towardsdatascience.com/powerful-eda-exploratory-data-analysis-in-just-two-lines-of-code-using-sweetviz-6c943d32f34
  4. Not completely EDA, but related to: https://medium.com/data-science-community-srm/machine-learning-visualizations-with-yellowbrick-3c533955b1b3
  5. If you want to see  how others do that: https://towardsdatascience.com/creating-python-functions-for-exploratory-data-analysis-and-data-cleaning-2c462961bd71

Tell me about your experience, I would like to hear your best  practices and overcomings of often-faced challenges.a, 

Cheers! 🙂

Cover photo taken from: https://unsplash.com/@clarktibbs