The first step to check if your data is normally distributed is to plot a histogram and observe its shape. If it looks bell-shaped and symmetric around the mean you can assume that your data is normally distributed. However, using histograms to assess normality of data can be problematic especially if you have small dataset. […]

# Statistics

## Big Data and the Role of the Data Scientist

The latest adventurers are into analysing Big Data to get invaluable nuggets of insights. They scour the vast tracts of data, wrangle with it, clean it and analyse it to arrive at insights that a small data set can never reveal. Genomics, physics, Internet marketing and financial data are some of the examples where high […]

## Understanding Hypothesis Testing and p-value

Behavioral scientists, market researchers, astrophysicists, drug testers all seek to better understand the target group. Often it is next to impossible to assess the entire population. Inferential statistical testing is instead done on a sample that exhibits most if not all characteristics of the population. This is done using hypotheses testing. Hypothesis (plural form being […]

## What is Hypothesis Testing

Many a times, we want to test the validity of a statement. For example, is the mean return from this mutual fund more to the mean return from the benchmark? While answering such a question, our interest is not to find the actual mean returns of the mutual fund, but to test whether the statement […]

## How to Read Student’s t Distribution Table (With PDF)

Student’s t distribution table has the following structure: The row represents the upper tail area, while the column represents the degrees of freedom. The body contains the t values. Note that for on-tail distribution the values are for a and for two-tailed distribution values are for a/2. Let’s say n = 3, the df= 3-1 […]

## Why Lognormal Distribution is Used to Describe Stock Prices

The concept of lognormal distribution is very closely related to the concept of normal distribution. Let’s say we have a random variable Y. This variable Y will have a lognormal distribution if the natural log of Y (ln Y) is normally distributed. So, we check if the natural logarithm of a random variable is normally […]

## How to Construct a Frequency Distribution

The statistical data that we collect can be presented in the form of a frequency distribution. A frequency distribution refers to summarizing a large data set into a small number of intervals. Let’s take an example to understand how to construct a frequency distribution. Let’s say we have the following 20 observations with us. 1.5, […]

## Best Linear Unbiased Estimator (B.L.U.E.)

The Need There are several issues when trying to find the Minimum Variance Unbiased (MVU) of a variable. The Probability Density Function (PDF) is not known It is difficult to model the PDF Even in cases where the PDF is known it is difficult to arrive at the estimate of the minimum variance The intended […]

## Linear Combinations of Random Variables

The joint distribution of a particular pair of linear combinations of random variables which are independent of each other is a bivariate normal distribution. It forms the basis for all calculations involving arbitrary means and variances relating to the more general bivariate normal distribution. The property of rotational symmetry implies that the joint distribution of […]

## Independent and Identically Distributed Variables

Definition I.I.D’s or independent and identically distributed variables are commonly used in probability theory and statistics and typically refer to the sequence of random variables. If the sequence of random variables has similar probability distributions but they are independent of each other then the variables are called independent and identically distributed variables. This is a […]