Statistics – Understanding the Levels of Measurement

One of the most important and basic step in learning Statistics is understanding the levels of measurement for the variables. Let’s take a step back and first look at what a variable is? A variable is any quantity that can be measured and whose value varies through the population. For example, if we consider a population of students, the student’s nationality, marks, grades, etc are all the variables defined for the entity student, and their corresponding value will differ for each student. Looking at the larger picture, if we want to compute the average salary of the US citizens, we can go out and record the salary of each and every person to compute the average or choose a random sample from the entire population and compute the average salary for that sample, and then use the statistical tests to derive conclusions for a wider population.

The type of statistical test that can be used to derive a conclusion about the wider population depends upon the level of measurement of the variable under consideration. The level of measurement of a variable is nothing but the mathematical nature of a variable or, how a variable is measured.

Broadly, there are 4 levels of measurement for the variables –

Continue reading

Comparing Real Time Analytics and Batch Processing Applications with Hadoop MapReduce and Spark

Apache Spark is an engine for fast, large scale data processing. It claims to run the programs up to 100x faster than Hadoop MapReduce in-memory, while 10x faster with the disks. Introduction of Hadoop Mapreduce framework greatly simplified the problem of big data management and analysis in a cost-efficient way. With the help of commodity hardware, we can apply several algorithms on large volumes of data. But MapReduce failed to show its performance while implementing complex and multi-stage algorithms. Through this article, we tried to dig deep to understand why Apache Spark upstages Apache Hadoop MapReduce framework.

Unified Architecture

Introduction of big data mandated the development of sophisticated tools that runs faster and are easy to use. We need such tools for various applications such as interactive query processing, ad-hoc queries on real-time streaming data and sophisticated data processing on historical data for better decision making.

Continue reading