﻿ How to Analyze and Report above AVERAGE

# How to Analyze and Report above AVERAGE

Following article is an excerpt from my best selling book: Maths and Stats for Web Analytics and Conversion Optimization which I am sharing it here for the first time, to benefit the wider audience: Nobody wants to be average and yet we all love averages.

That is why our analytics reports are all jam packed with averages: In order to analyze and report above average, we would first need to stop being obsessed about all the metrics which are ‘average’ and take the insight they provide with a huge grain of salt.

Any set of measurement has two important properties:

1. The central value

We calculate the central value with the aim to determine a typical value in a data set.

A data set is a set of observed values for a particular variable (say avg. time on site).

We measure the spread with the aim to determine how similar or varied the set of observed values are in a data set.

If the set of observed values are similar then the average (or mean) can be a good representative of all the values in the data set.

If the set of observed values vary by a large degree then the average (or mean) is not a good representative of all the values in the data set.

We calculate the central value through Mean, Median and Mode.

We measure the spread of data values through Range, Interquartile Range (IQR), Variance and Standard Deviation.

## Mean

The mean (also known as arithmetic mean or population mean) is simply an average of the numbers.

It is denoted by Greek letter µ (“mu”).

It is calculated as

Mean = sum of numbers /count of numbers

For example, let us suppose a website has got 5 web pages: Now bounce rate of the site = (35+40+0+48+100)/5 = 223/5 = 44.6%

But is 44.6% a true bounce rate?

No.

Look at the distribution of bounce rate across all the web pages.

Two web pages, page 3 and page 5 have extreme values of 0% and 100%. We call such values as ‘outliers’ in statistics.

Outliers have the sadistic ability to skew ‘averages’. Now average time on the site = (350+400+500+480+36000)/5 = 37730/5 = 7546 = 2 hrs 6 minutes

But is ‘2hrs 6 minutes’ a true average time on a page?

No.

Look at the distribution of average time across all the web pages.

The web page ‘page 5’ has extreme values of 36000.

Again the outlier ‘36000’ is skewing our average metric.

This is the fundamental problem with averages and the tragedy is that Google Analytics use this metric throughout its reports: ‘Average time on page’, ‘average time on site’, ‘site average’…..You can’t really escape from ‘averages’.

As long you keep analyzing and reporting these average metrics you will get average results.

So what is the solution then?

## Median

It is a middle number in a sorted list of numbers.

For example, let us suppose a website has got 5 web pages: Let us first sort the list:

0%, 35%, 40%, 48%, 100%

If we calculate the median (instead of mean) of this data set then it will be 40%.

Now is 40% a true representative of a typical bounce rate of the site?

Yes.

This is because unlike mean, the median (or middle value) is not impacted by outliers (in our case: 0% and 100%).

Similarly, Let us first sort the list:

350, 400, 480, 500, 36000

Here the middle number is 480.

So median of the data set is 480.

Now is 480 seconds (or 8 minutes) a true representative of a typical time spent on a web page?

Yes.

This is because unlike mean, the median (or middle value) is not impacted by outliers (in our case: 36000)

You can always download Analytics data/report into excel and calculate the median of any data set (n matter how large) through MEDIAN excel function.

However calculating median of each and every data set all day long can be very time consuming and not practical for many.

So what is the solution?

The solution is, you first measure the spread of the data values in a data set and then decide whether or not you can trust the average value reported by your analytics tool like Google Analytics.

There are two ways of measuring the spread:

1. You look at the distribution of values in a data set, find and eliminate outliers (or extreme values). I use this method majority of the time. We measure the spread by calculating the ‘Range’ which is simply the difference between maximum value and minimum value in a data set.

If you look at the chart above, the minimum value is something like 4 minutes and 30 seconds and the maximum value is something like 9 minutes and 30 seconds.

So Range is:

9 minutes 30 seconds – 4 minutes 30 seconds = 5 minutes

If you look at the chart again, there is one outlier (or extreme value) i.e. 9 minutes and 30 seconds.

This outlier is skewing the ‘average visit duration’ because it has increased the value of range.

If we discount this outlier than the new maximum value would be something like 5 minutes 30 seconds.

So now Range is:

5 minutes 30 seconds – 4 minutes 30 seconds = 1 minute

Small range indicates that the central value (in our case the average visit duration) is a better representative of the typical value in a data set.

So if we discount the outlier and then calculate the average visit duration then we will get a better central value or typical value.

That’s why it is important that we look at the distribution of values, calculate the spread and identify and discount outliers before we choose to trust an average metric/value.

Granted this is not the most accurate way to measure spread and determine the central value but it is practical and works esp. when you have to look at hundreds of reports day and night and you don’t have time to calculate median or spread through IQR.

So instead of blindly relying on averages you look at the distribution of data points.

Determine how narrow or wide spread the distribution of values is in a data set by calculating the ‘Range’.

A very wide spread distribution means you can’t rely on the average metric.

Another example: Average Rank

Not only average metrics haunt Google Analytics reports; you can also find them haunting Google Search Console reports: Look at the distribution of ranking positions: If you are ranking from position 2 to 3rd+ page (or better say position 2 to 30+ position) for a search query then you can’t rely on an average value.

This is because the range of ranking positions is too large.

You don’t need to manually calculate the range here.

It is quite evident from the distribution.

That’s why I urge to look at the distribution.

If you won’t measure the spread of data values you will never know whether or not your average value is a true representative of the typical value in a data set.

That’s why it is important that you calculate both the central value and the spread of the data values.

Note: You can calculate range in excel by using the formulas Max and Min.

For example:  =MAX(F4:P4)-MIN(F4:P4). Here F4:P4 is a cell range.

Max() returns the largest value and Min() returns the lowest value in a set of values.

2. The more difficult and time consuming way of calculating spread is through IQR, variance or standard deviation.

If you have a very large data set with lot of outliers than you can’t depend upon the visual method I explained above to determine spread of data values.

You then use IQR, variance or standard deviation to calculate the spread.

I recommend using IQR because it is a better measure of spread than the range or standard deviation as it is less likely to be distorted by outliers.

So you calculate the IQR and then decide whether you can rely on the average value reported by your analytics tools.

In order to understand IQR, you first need to understand Quartiles.

A quartile is one of the four equal groups in which a data set can be divided. For example consider the following ordered data set:

 4 6 10 14 15 16 17 17 18 20 20

Here the point between the lowest 25% of the values is called the 25th Percentile or the lower Quartile.

The lower quartile is denoted by Q1

The point between the 50% of the values is called the 50th Percentile or the second Quartile.

This second quartile is actually the median. So median is also denoted by Q2

The point between the lowest 75% of the values is called the 75th Percentile or the Upper Quartile.

The upper quartile is denoted by Q3.

The difference between the Upper quartile and lower quartile is called the Interquartile Range.

So, IQR = Q3-Q1

In Excel 2013 there is a function called QUARTILE through which you can calculate Q1, Q3 and eventually IQR.

Syntax: =QUARTILE (array, quart)

Here ‘array’ is the range of cells which contains the data set.

‘Quart’ is the parameter which is used to specify which quartile to return.

It can have three values: First Quartile, Median Value and Third Quartile as shown below: Through QUARTILE function you can calculate the first and third quartile. Once you have done this then find IQR using the formula Q3-Q1: The data values that deviate from the middle value by more than twice the IQR are called outliers.

The data values that deviate from the middle value by more than 3.5 times the IQR are called ‘far outliers’.

In order to get a better understanding of how IQR works, you must know how it is calculated manually.

The following video explains calculating IQR manually.

Related Post: Common Google Analytics Mistakes that kill your Analysis, Reporting and Conversions

## Segmentation – Powerful Method to fight ‘AVERAGES’

Another powerful method to reduce the negative impact of ‘average’ metrics on your analysis and business decision is ‘Data Segmentation’.

Segment like Hell.

The more you will segment the data, the smaller will be the data set and the data values will be more close to the mean or average value.

In layman language, the more you will segment the data; more accurate will be your average metrics.

Because of this reason, you will get a better insight if you analyze the Goal conversion rate of organic search for each of your goal in your target market (say New York) than the conversion rate of the organic search for all of the locations from which your site gets traffic.

Like it or not but you learned lot of statistics in this post to fight averages.

In order to become above average in marketing or analytics, you need to learn even more statistics.

“Analyzing data without a basic understanding of statistics will always almost result in erroneous conclusions. “ – That’s my theory

I have proved this theory time and again in my posts:

I can’t stress enough the importance of statistics and its supersets econometrics and data science in solving real life problems.

Let me give you one good example.

According to the law of diminishing marginal utility, the first unit of consumption of a good/service produces more utility than second and subsequent units.

Which means the very first article that you will read on a topic say ‘Google authorship’ will produce more benefit than second and subsequent articles on the same topic.

So more articles you will read on ‘Google Authorship’, the less you will benefit from it.

Then soon you will reach the point of diminishing returns and once you crossed this point, your efficiency will start decreasing and you will be less productive.

Needless to say, in our industry every new shiny thing/topic (from Pinterest to Google Authorship) is tortured to death in the name of blogging and thought leadership and we tend to read every new article on the same topic in a hope to gain something new.

But at the same time we forget how the law of diminishing marginal utility is making us less and less productive with each additional unit of consumption.

Reading more will give you less time to do more and if you do less and read more than you will learn less.

It generally means you learn less.

That’s why I suggest reading less.

So here we go.

I just applied the law of diminishing marginal utility in solving a real life problem (increasing productivity) and saved your countless hours.

## Do you know the difference between Web Analytics and Google Analytics?

99.99% of course creators themselves don’t know the difference between Web analytics, Google Analytics (GA) and Google Tag Manager (GTM).

So they are teaching GA and GTM in the name of teaching Web analytics.

They just copy each other. Monkey see, monkey do.

But Web analytics is not about GA, GTM.

It is about analyzing and interpreting data, setting up goals, strategies and KPIs.

Web Analytics is the core skill. Google Analytics is just a tool used to implement ‘Web Analytics’.

You can also implement ‘Web analytics’ via other tools like ‘adobe analytics’, ‘kissmetrics’ etc.

Using Google Analytics without the good understanding of ‘Web analytics’ is like driving around in a car, in a big city without understanding the traffic rules and road signs.

You are either likely to end up somewhere other than your destination or you get involved in an accident.

You learn data analysis and interpretation from Web analytics and not from Google Analytics.

The direction in which your analysis will move, will determine the direction in which your marketing campaigns and eventually your company will move to get the highest possible return on investment.

You get that direction from ‘Web analytics’ and not from ‘Google Analytics’.

You learn to set up KPIs, strategies and measurement framework for your business from ‘Web analytics’ and not from ‘Google Analytics’.

So if you are taking a course only on 'Google Analytics’, you are learning to use one of the tools of ‘Web analytics’. You are not learning the ‘Web analytics’ itself.

Since any person can learn to use Google Analytics in couple of weeks, you do no get any competitive advantage in the marketplace just by knowing GA.

You need to know lot more than GA in order to work in Web analytics and marketing field.

So what I have done, if you are interested, is I have put together a completely free training that will teach you exactly how I have been able to leverage web/digital analytics to generate floods of news sales and customers and how you can literally copy what I have done to get similar results.

## Here what You'll Learn On This FREE Web Class!

1) Why digital analytics is the key to online business success

2) The number 1 reason why most marketers are not able to scale their advertising and maximize sales.

4) Why you won’t get any competitive advantage in the marketplace just by knowing Google Analytics.

5) The number 1 reason why conversion optimization is not working for your business.

7) How to learn and master digital analytics and conversion optimization in record time. ## My best selling books on Digital Analytics and Conversion Optimization

Maths and Stats for Web Analytics and Conversion Optimization
This expert guide will teach you how to leverage the knowledge of maths and statistics in order to accurately interpret data and take actions, which can quickly improve the bottom-line of your online business.

Master the Essentials of Email Marketing Analytics
This book focuses solely on the ‘analytics’ that power your email marketing optimization program and will help you dramatically reduce your cost per acquisition and increase marketing ROI by tracking the performance of the various KPIs and metrics used for email marketing.

Attribution modelling is the process of determining the most effective marketing channels for investment. This book has been written to help you implement attribution modelling. It will teach you how to leverage the knowledge of attribution modelling in order to allocate marketing budget and understand buying behaviour.

## Himanshu Sharma

### Digital Marketing Consultant and Founder of Optimizesmart.com Himanshu helps business owners and marketing professionals in generating more sales and ROI by fixing their website tracking issues, helping them understand their true customers purchase journey and helping them determine the most effective marketing channels for investment.

He has over 12 years experience in digital analytics and digital marketing.

He was nominated for the Digital Analytics Association's Awards for Excellence.

The Digital Analytics Association is a world renowned not-for-profit association which helps organisations overcome the challenges of data acquisition and application.

He is the author of four best-selling books on analytics and conversion optimization:

error: Alert: Content is protected !!