The world is moving fast. Immediacy of information and analytics knowledge is a need increasingly requested by people, companies, etc. At the same time many sources of information are offering data in Real Time Streaming over the network. Using efficiently Streaming technologies of Big Data ecosystem allows us to deal with immediacy problems.See Demos Back
Business Intelligence technologies allow us to make an interactive analysis (OLAP), responding to different types of queries in constant time. However with the born of Big Data, traditional technologies used for this purposes are not efficient enough to deal with big amounts of data. For solving the problem of making analytics over billions of rows in some cases, technologies like, Kylin (MOLAP), Lens (ROLAP) are emerging.See Demos Back
There are more than 1 billion web pages in the world, and the number is growing. Processing and analyzing such amount of information is a clear example of Big Data. Using Big Data technologies for semantic processing, we can get extract useful information from different types of web sites (news, blogs, search engines) making Real Time analytics and integrating it with our analytical applications.See Demos Back
The connectivity of the elements in the "Information Era" is a fact, that means that all is connected o related in some way. This induct us to use Graph as a best Data Structure for dealing with data where relation between elements is determinant in the analysis. Using DB and Graphs's Algorithm we can make very interesting analysis that offer a different perspective of the data.See Demos Back
In many occasions the most relevant information that can be extracted from data can't be obtained making an SQL query or a programmed Algorithm, generally because they require some type of reasoning, that can't be achieved using traditional Queries or Algorithms. In this cases using Machine Learning, techniques that helps us to race up the analytics to a highest level. Some examples could be Predictive Analysis, Recommendation Systems, Pattern Recognition. In out demo we have exposed some.See Demos Back
Using the connectivity of any electronic devices we use every day, is revolutionizing the way we obtain and process information. The possibility of sensors, to send data about its measures or state to a remote server, makes us be aware at Real Time, about the behavior of an specific process and monitorize it or analyze it in Real Time. There are many interesting use cases about this kind of solution, in this section we show some of them.See Demos Back
In StrateBI we believe in the value of Big Data technologies for data processing and the possibility of obtain knowledge using it, with the goal of making easier the process of decisions in any industry. Our team makes a great job on I+D+i in Big Data
We keep updated about news and scientific articles published about Big Data technologies.
Its made with emerging ones that we think have a great potential, as well as the consolidated ones.
With this, we detect new features that can improve the behavior or performance of our solutions.
We put in practice the results of the research phase.
We deploy the improvements and validate its application in real use cases, similar to the ones we show in this demo.
Once we test the usefulness and robustness of improvements or new features added we introduce in our solutions in different projects.
In this way StrateBI guarantees the use of cutting edge Big Data technologies, previous tests and improvements by out I+D+i in Big Data
Apache Hadoop is the most popular Big Data environment, it allows the distributed computing on clusters with commodity hardware and low cost.
The basic and default configuration for a Hadoop cluster includes distributed storage of data using (HDFS), a resource manager (YARN) Yet Another Resource Negotiator, and running on top of this one, is the (Map Reduce) framework, that perform the distributed processing of data.
Besides these components, there are another set of higher level tools, for storing and processing data, like Hive or Spark, as an example. They offer the abstraction that simplifies the development for that environment.
As mentioned before, Hadoop is the most popular Big Data environment, the reason is because it offer a wide range of technologies and a very high robustness level. It is ideal for the new concept of Data Lake for the later analytics using powerful BI tools.
Flume is a distributed and trustworthy system for the efficient collection, aggregation and processing of Streaming Data.
Kafka is a distributed message system that use the pattern publish-subscribe, is fault tolerant, horizontal scalable and is ideal for Stream Data Processing
To make easier the management, installation and maintenance of hadoop cluster we work with two main Hadoop Distributions.
A hadoop distribution is a software package, that include the basic components of Hadoop, with a plus of other technologies, frameworks and tools and the possibility of installing using a web application.
About this, in Stratebi we recommend the use of a hadoop distribution. Being Hortonworks and Cloudera the leader distributions currently in the market. For this reason our demo is running over a Cloudera distribution and a Hortonworks distribution.
Spark implements the Map Reduce programming paradigm making intensive usage of RAM memory instead of disk.
Using Spark, we can improve the performance of Map Reduce applications by implementing iterative algorithms, machine learning (MLib), statistics analysis R module, or real time analytics Spark Streaming, all this is icluded in our demo.