Data Exploration with Pandas (part 1)

Sinisa Jovic Technology 0 Comments

If you ever decide to become someone who is into big data, surely you can do it without having a clue about pandas. But that’s not the brightest solution, because why would you leave aside something that’s gonna make you a lot better. Pandas as well know library for manipulating datasets that contains numerical and table structures, which makes it pretty good-to-know library for data engineers and data scientists. In part 1 we’re gonna go through some of the basic stuff to introduce you to the Pandas capabilities.. For the purpose of this article, as an example dataset I’ve used …

Handling missing data

Valentina Djordjevic Technology 0 Comments

Hi, everyone. Although I planned for my next post to be about anomaly detection and their treatment,  I faced some other type of problem that quickly escalated into huge issue affecting the modelling and results accuracy, and couldn’t resist to share my experience as soon as possible. In this post, I will be talking about the problem of handling missing data. Missing data represents an everyday problem for an analyst. We got used to it, and most often, we just treat it with some standard techniques, and continue with the analysis. That’s what I’ve done, until I realized it’s not …

Forecasting with VAR and Prophet

Valentina Djordjevic Technology 2 Comments

In my previous post, I tried to present the ARIMA model for forecasting. It was based on the use of autoregression and moving average concepts, combining the regression of variable based on its lagged values and calculation of error based on the linear combination of error terms occurred in the past, respectively. In this post, we’re going to talk about VAR and Prophet as alternative models. The main problem was that developed ARIMA model could not be applied to predict the output  for all of different objects we ran the forecasting for, which was the principal idea. As we dug …

Interactive log analysis with Apache Spark

Sinisa Jovic Technology 0 Comments

The Internet is becoming the largest global shop across markets, and anyone who is offering products and services of any kind prefers for web shops to become the primary outlets to supply customers. This leads to a reduction in the number of employees and traditional brick and mortar branches and reduction in costs, so it is clear that the customer behavior analysis on digital and online channels is of great importance. For this reason it should not be surprising that many companies accept this kind of analysis as a basic need. In this post I will not focus that much …

Forecasting with ARIMA

Valentina Djordjevic Technology 8 Comments

One of the most challenging machine learning problems is predicting some output based on the history of its previous values. The complexity of the problem multiplies as new features and constraints are added to analysis. Thus, in time series analysis it is not always enough to use previous values only, there often are many features that could impact on the output which should be predicted. There are numerous models and algorithms developed for this kind of problem. In this article, and series of articles that follows, several models, such as ARIMA, VAR and  Facebook Prophet, will be introduced and finally …