HOW TO START WITH DATA SCIENCE?

Posted on

After participating in a meetup at the end of March, subjected “Data Science – what is it?”, a lot of people contacted me to send them some introductory materials to help them get started with learning. It took me a long time to sit down and start compiling a list, because there are many sources, …

Read More

Dash by Plotly

Posted on

Let’s say you have been working on a project for clients segmentation. You have your client segments well separated and your final task is to present findings and results to the project stakeholders. Usual situation is that none of them have that level of technical expertise to understand your code so you need to visualize …

Read More

FRIDAY TALKS: WOMEN IN DATA SCIENCE

Posted on

Hello, fellas. Cool down, I’m not going to talk about extreme feminism and gender (in)equality. 🙂 This post is going to be about extraordinary women I had a chance to meet at the Women in Data Science conference, held in Subotica, this April. I truly believe that these girls deserve to be heard of, as …

Read More

Hello Docker

Posted on

Having spent couple of weeks on data preparation and developing that particular machine learning model, you are finally ready to show off with some really good results to your boss. You have your notebooks with lines of code doing magic, maybe some reports in Excel,  amazing visualizations in Plotly etc. It’s 5 minutes till your presentation …

Read More

Friday talks: A Data Science Project

Posted on

This post is not going to be about another Data Science course you should enroll in. It’s not going to be about various skills you should build in order to develop a Data Science project, either. Considering the title of this post – A Data Science Project – I tried to create a pun. Your …

Read More

Data Exploration with Pandas (Part 2)

Posted on

In the previous article, I wrote about some introductory stuff and basic Pandas capabilities. In this part, the main focus will be on DateTime values. I am also going to introduce you to some grouping and merging possibilities in Pandas. For this purpose here is another dataset downloaded from UCI Repository, which contains date and time …

Read More

Anomaly detection

Posted on

The problem of anomaly detection is a very challenging problem often faced in data analysis. Whether it is about clustering, classification or some other machine learning problem, it is of great importance to identify anomalies and handle them in some way, in order to achieve optimal model performances. Furthermore, anomalies could often influence the analysis …

Read More

Data Exploration with Pandas (part 1)

Posted on

If you ever decide to become someone who is into big data, surely you can do it without having a clue about pandas. But that’s not the brightest solution, because why would you leave aside something that’s gonna make you a lot better. Pandas as well know library for manipulating datasets that contains numerical and …

Read More

Handling missing data

Posted on

Hi, everyone. Although I planned for my next post to be about anomaly detection and their treatment,  I faced some other type of problem that quickly escalated into huge issue affecting the modelling and results accuracy, and couldn’t resist to share my experience as soon as possible. In this post, I will be talking about …

Read More

Forecasting with VAR and Prophet

Posted on

In my previous post, I tried to present the ARIMA model for forecasting. It was based on the use of autoregression and moving average concepts, combining the regression of variable based on its lagged values and calculation of error based on the linear combination of error terms occurred in the past, respectively. In this post, …

Read More

Interactive log analysis with Apache Spark

Posted on

The Internet is becoming the largest global shop across markets, and anyone who is offering products and services of any kind prefers for web shops to become the primary outlets to supply customers. This leads to a reduction in the number of employees and traditional brick and mortar branches and reduction in costs, so it …

Read More

Forecasting with ARIMA

Posted on

One of the most challenging machine learning problems is predicting some output based on the history of its previous values. The complexity of the problem multiplies as new features and constraints are added to analysis. Thus, in time series analysis it is not always enough to use previous values only, there often are many features …

Read More