Data Exploration with Pandas (Part 2)

Sinisa Jovic Technology Leave a Comment

In the previous article, I wrote about some introductory stuff and basic Pandas capabilities. In this part, the main focus will be on DateTime values. I am also going to introduce you to some grouping and merging possibilities in Pandas. For this purpose here is another dataset downloaded from UCI Repository, which contains date and time columns. This time my data comes from Excel file, so the way I read it is a slightly different than in the previous post. After importing pandas library (see the previous blog post) I’m going to use read_excel() function and display data sample with head() …

Big Data and Banking: How Our Data Protects Us

darko Business Leave a Comment

Some of the main tasks for the bankers are, among others, keeping, preserving, deriving more from less, taking the best out of what the clients deposit to them, with great confidence and trust. Almost the same definition could be applied to the Data Scientists dealing with the big data. They also deal with keeping, preserving, deriving more conclusions from a little information, getting the best out of the existing databases which they handle with great care and confidentiality. It is maybe because of this similarity in definitions, or maybe because of the fact that the databases in the banks are …

Why the Telco Industry seems destined for Big Data

darko Business Leave a Comment

A man is defined in numerous scientific ways, but one definition seems unchanged since the beginning of time. The man is the creature that communicates – from the first attempt to speak, till the last tale told to grandchildren. “The telecommunications are defined as the exchange of the information between the source and the destination, with the use of technologies. The transmitting ways vary and are getting more complicated as the communication is becoming multi-dimensional. It all started with smoke signal communication, that is considered to be the first ‘digital’ communication ever”, Vip Mobile engineer Djordje Begenisic describes the industry …

Darko Marjanovic, Things Sover, Data Science

Success Formula: Science, Business and Programming

Milos Milovanovic Business Leave a Comment

In every business, the ultimate dream is the one about the magic success formula. The search for ingredients is very extensive, but they often live next door. So now the greater „magic” becomes the question of using, understanding and connecting those dots. That „magic” even has a name. „Data Science is actually the applied science in business and programming – the attempt to learn as much as we could, from the data we collect”, says Things Solver CEO Darko Marjanovic. „We try to learn from the data, but not to stop there. We use that knowledge to make useful predictions, …

Anomaly detection

Valentina Djordjevic Technology 3 Comments

The problem of anomaly detection is a very challenging problem often faced in data analysis. Whether it is about clustering, classification or some other machine learning problem, it is of great importance to identify anomalies and handle them in some way, in order to achieve optimal model performances. Furthermore, anomalies could often influence the analysis results, which could be the cause of drawing wrong conclusions, affecting the making of important business decisions. Thus, in every data analysis, it is required to accurately define anomalous behaviour in a certain domain, apply appropriate anomaly detection model, extract anomalies from the rest of …

Data Exploration with Pandas (part 1)

Sinisa Jovic Technology Leave a Comment

If you ever decide to become someone who is into big data, surely you can do it without having a clue about pandas. But that’s not the brightest solution, because why would you leave aside something that’s gonna make you a lot better. Pandas as well know library for manipulating datasets that contains numerical and table structures, which makes it pretty good-to-know library for data engineers and data scientists. In part 1 we’re gonna go through some of the basic stuff to introduce you to the Pandas capabilities.. For the purpose of this article, as an example dataset I’ve used …

Handling missing data

Valentina Djordjevic Technology Leave a Comment

Hi, everyone. Although I planned for my next post to be about anomaly detection and their treatment,  I faced some other type of problem that quickly escalated into huge issue affecting the modelling and results accuracy, and couldn’t resist to share my experience as soon as possible. In this post, I will be talking about the problem of handling missing data. Missing data represents an everyday problem for an analyst. We got used to it, and most often, we just treat it with some standard techniques, and continue with the analysis. That’s what I’ve done, until I realized it’s not …

Forecasting with VAR and Prophet

Valentina Djordjevic Technology 4 Comments

In my previous post, I tried to present the ARIMA model for forecasting. It was based on the use of autoregression and moving average concepts, combining the regression of variable based on its lagged values and calculation of error based on the linear combination of error terms occurred in the past, respectively. In this post, we’re going to talk about VAR and Prophet as alternative models. The main problem was that developed ARIMA model could not be applied to predict the output  for all of different objects we ran the forecasting for, which was the principal idea. As we dug …

Interactive log analysis with Apache Spark

Sinisa Jovic Technology Leave a Comment

The Internet is becoming the largest global shop across markets, and anyone who is offering products and services of any kind prefers for web shops to become the primary outlets to supply customers. This leads to a reduction in the number of employees and traditional brick and mortar branches and reduction in costs, so it is clear that the customer behavior analysis on digital and online channels is of great importance. For this reason it should not be surprising that many companies accept this kind of analysis as a basic need. In this post I will not focus that much …

Forecasting with ARIMA

Valentina Djordjevic Technology 11 Comments

One of the most challenging machine learning problems is predicting some output based on the history of its previous values. The complexity of the problem multiplies as new features and constraints are added to analysis. Thus, in time series analysis it is not always enough to use previous values only, there often are many features that could impact on the output which should be predicted. There are numerous models and algorithms developed for this kind of problem. In this article, and series of articles that follows, several models, such as ARIMA, VAR and  Facebook Prophet, will be introduced and finally …