Maths and Stats behind Web Analytics – Beginners Guide

 

A long time ago in a galaxy far, far away a person was questioned for his unreasonable use of stats techniques in website analysis by an authority figure. Unfortunately that person was me. Maths is not really a problem for me till it turns into sets, probability, regression, standard deviation etc or in other words a more specialized field of statistics.

So before I move forward, here is a small warning. I am just a student of statistics and not a full blown statistician. Consequently my methods could be prone to statistical errors. However feel free to jump on me if you can catch an error. I dare you ;)

 

Why you need to know the Maths and Stats behind Web Analytics

It is not just about what you should know but is more about what you are expected to know esp. if you deal with businesses day in, day out. Here are few questions for your consideration:

Q1. When your website conversion rate jumps from 10% to 12% then is it 2% rise in conversion rate or 20% rise in conversion rate?

Q2. Can you double your sales by simply doubling your marketing budget?

Q3. Should you focus on large number of low value customers instead of few high value customers to maximize profit?

Q4. If campaign ‘A’ conversion rate is 10% and campaign ‘B’ conversion rate is 20%, does it mean campaign ‘B’ is performing better than campaign ‘A’?

Q5. Average time on your website is 5 minutes. Does this mean website visitors spend 5 minutes time on an average?

I am sure many marketers/analyst will make mistakes while answering these questions. I am sure because I made these mistakes too. It is not like I was born with statistical knowledge. It is something which I have acquired over time and is still acquiring. So if you are not sure about the answers to my questions then it is perfectly normal.

However if I happen to be your client or boss then it is not normal. Sorry. You have disappointed me and I doubt your reports now.

 

The grumpy boss may not know there has been a negative correlation between conversion rate and revenue and there is no 2% rise in conversion rate. But he sure knows that his sales are going down and that he is less profitable. And now he has this moral obligation to do the dreaded ‘cost cutting’. Guess whose job is on the line next?

The corporate world is not very forgiving to mistakes made by employees/consultants/agencies. So if we report that the jump in conversion rate from 10% to 12% means there is a 2% rise in conversion rate, our entire marketing report becomes questionable. We instantly create a shadow on the rest of our analysis. The thought that will instantly pop up in the mind of the recipient, of our report will be “what else he has done wrong?”

 

Learning Maths and Stats is an excellent way to develop your logical and critical thinking. It makes you a better marketer and off course a better analyst. No one can then easily question your reasoning skills and you become a true NINJA. Following is one example from everyday life:

Which is a better deal?

3 ads placement for $40.12 or 2 ads placement for $30.65

If you calculate and compare the unit prices then you will find that ‘3 ads placement for $40.12’ is a better deal.

So here we go.

 

% Change

This metric is used to calculate the percentage of rise or fall in relation to the old value. We use % change when there is an old value and a new value.

% change = [ (New Value – Old Value)/Old value ] * 100

For example:

 

Visits

Conversions

Conversion Rate

July

200

20

10%

Aug

350

42

12%

% Change

75%

110%

20%

 

% change = [(350-200)/200]*100 = 75%

% change = [(42-20)/20]*100 = 110%

Google Analytics calculate % change (for every type of metric) when you compare the data with the past data. So if pages/visit in July was 2 and in August was 4, then you wont report that pages/visit has improved by 2. You will report that pages/visit has improved by 100% (i.e. the percentage of change from the old value). Similarly we calculate % of change for avg. visit duration, %new visits, bounce rate etc.

 

% difference

This metric is used to calculate the difference between two values in percentage. Use this metric when neither value is more important the other.

% difference = [|difference between two values|/average of the two values] *100

For example:

 

Visits

Conversion Rate

July

200

10%

Aug

350

12%

% Change

75%

20%

% Difference

54.54%

18.18%

 

% difference = [|200-350|/(200+350)/2)] * 100 = (150/275)*100 = 54.54%

% difference = [|10-12|/(10+12)/2)] * 100 = 18.18%


Note:
Ignore the minus sign if the result is negative.

Many people make mistake by assuming that % change and % difference are same thing. They are not as i have just shown you. We use % change when there is an old value and a new value and we need to know the percentage of rise or fall in relation to the old value. Most of the time we use % change in reporting.

 

% Error

This metric is used to calculate the percentage of magnitude of the error when comparing approximate value to an exact value.

% error = [|approximate value – exact value|/|exact value|] * 100

For example:

 

Approximate conversions

Actual conversions

% error

July

200

110

81.82%

Aug

300

200

50%

 

I estimated 200 conversions in July, but got 110 visits. So my % of error is [|200-110|/110] * 100 = 81.82%

I estimated 300 conversions in August, but got 200 visits. So my % of error [|300-200|/200] * 100 = 50%

A practical way you would use % error is when you are running an experiment.  You want % error to be as low as possible.

Note: Ignore the minus sign if the result is negative.

 

Percentage Points

We use percentage points when we subtract one percentage from another to imply that the change is not relative. For example:

Conversion rate jumps from 10% to 12%. So is it 20% rise in conversion rate or 2% rise in conversion rate?

It is actually 20% rise in conversion rate or 2 percentage points rise in conversion. It can’t be 2% rise in conversion rate.

 

Mean

It is also known as arithmetic mean or population mean. It is simply an average of the numbers. Mean is a type of average. Whereas average could be mean, median, mode or variance. Mean is denoted by Greek letter µ (“mu”). This metric is used a lot in Google Analytics by the name of ‘average’: ‘Average time on page’, ‘average time on site’, ‘site average’…..

Mean = sum of numbers /count of numbers

For example let us suppose a website has got 5 web pages:

 

Bounce Rate

Page 1

35%

Page 2

40%

Page 3

0%

Page 4

48%

Page 5

100%

Now bounce rate of the site = (35+40+0+48+100)/5 = 223/5 = 44.6%

Is 44.6% a true bounce rate?

No.

Look at the distribution of bounce rate across all the web pages. Two web pages, page 3 and page 5 have extreme values of 0% and 100%. We call such values as ‘outliers’ in statistics. Outliers have the sadistic ability to skew ‘averages’. So if we take out these two extreme values from our calculations then we can get more accurate bounce rate of the site:

(35+40+48)/3 = 123/3 = 41%

Similarly,

 

Average time on page (in seconds)

Page 1

350

Page 2

400

Page 3

500

Page 4

480

Page 5

36000

Now average time on the site = (350+400+500+480+36000)/5 = 37730/5 = 7546 = 2 hrs 6 minutes

Again the outlier ‘36000’ is skewing our average metric. So if we take it out and then re-calculate the average time on site, we would get (350+400+500+480)/4 = 432.5 = 7 minutes 12 seconds

Therefore whenever we analyze ‘average’ metrics we always:

  1. Look at the distribution
  2. Identify the outliers (i.e. extreme values)
  3. Discount outliers from the averages’ calculations.

If you don’t do this then you will get muddy analytical insight from your average metrics like ‘average time on the site’ to be 2hrs 6 minutes.

 

Unit Price

It is equal to cost/quantity

Which is a better deal?

3 ads placement for $40.12 or 2 ads placement for $30.65

Calculating and comparing unit prices is a good way of finding the ‘best deal:

So, we calculate the unit price in each case:

$40.12/3 = $13.37 per ad placement

$30.65/2 = $15.32 per ad placement

so if we go for the ‘2 ads placement for $30.65′ deal we will end up paying more. Consequently the best deal for us is ‘3 ads placement for $40.12′

 

Gross Profit, Operating Profit, Net Profit, Bottomline Profit…

Lot of marketers make the mistake of reporting these metrics without understanding what these metrics really are and how they are calculated. Check out this classic video which is on the misreporting of the profit metric:

 

In simplest terms,

Profit = Sales Revenue – Cost
Revenue = price of the product(s) * quantity sold

Gross Profit = Sales Revenue – Direct Cost 

Direct cost can be something like cost of manufacturing a product

Operating Profit = Sales Revenue – Operating Cost.

It is the profit before interest and taxes. Operating cost is the ongoing cost of running a business, product or system. It can include both direct and indirect costs.

Net Profit – also known as net income, net earnings or bottomline. It is the profit after interest and taxes.

Net Profit = Sales Revenue – Total cost (this includes any direct and indirect cost + interest + taxes)

When we talk about business bottomline, we are actually talking about the ‘net profit’.

 

Profit Margin

It is also known as net profit margin, net margin, net profit ratio

Profit Margin = (Net Profit/ Revenue) * 100

Low profit margin indicates higher risk, that a decline in sales will erase the profit and result in net loss.

 

The law of diminishing returns and your Marketing Budget

According to the law of diminishing returns,

if you keep adding more of one unit of production to a productive process while keeping all others units constant, you will at some point produce lower per unit returns. 

So for example if you keep pumping more money into a Facebook campaign without changing the present form of the campaign, at some point you will reach the point of diminishing returns and once you cross this point, your conversion rate will go down and cost per acquisition will go up.

So when you are thinking of increasing the budget of a campaign by considerable amount, think of putting more ads and targeting more keywords. In this way you will change multiple units of production and can stay away from the point of diminishing returns.

 

How to determine the point of diminishing returns

To determine the point of diminishing returns you need to gradually add more of one unit of production in the production process. If you rapidly add units, you will never know when you crossed the point of diminishing returns and start losing money. So gradually increase your budget.

Understand that just doubling the budget of a high performing campaign may not result in proportional increase in performance.  You need to do a lot more than just increasing the budget then. Consider running more ads, targeting more keywords or new markets to stay away from the point of diminishing returns. So now i guess I have answerd my question: Can you double your sales by simply doubling your marketing budget?

 

The law of diminishing returns and Multichannel Marketing

Understand that no one campaign is solely responsible for conversions and sales if you are doing multi channel marketing.  Different marketing channels work together to create sales and conversions. Some marketing channels help more in assisting conversions than completing conversions. We call such channels as ‘assisting marketing channels‘. While other marketing channels work more in completing conversions.

So if  you over invest in a particular marketing channel while overlooked the role of assisting marketing channels, you will reach the point of diminishing returns faster than you think. Because you are adding more of one units of production (here budget) to one marketing channels while keeping other units constants (i.e. not investing proportional amount in assisting marketing channels).

Please see the related post for more details: Thinking of investing more in a marketing channel? Think Twice.

 

Law of diminishing returns and Last Click Keywords Optimization

Just like we has assisting marketing channels, we have got assisting keywords. These keywords help more in assisting conversions than completing conversions. Similarly we have got last click keywords. These are the keywords people searched for just before completing a conversion and are attributed conversions in a last click conversion model.

An average PPC marketer spends his life time optimizing for ‘last click keywords’ assuming that only these keywords make up the whole conversion funnel. He completing ignore the role played by assisting keywords. So in case of PPC, if you keep optimizing for last click keywords while ignoring first and middle clicks keywords (collectively known as assisting keywords) you will at some point produce lower per unit returns.

This means your cost per acquisition at some point will start rising and your profit on sales will start declining. Then the only way, to remain within your CPA targets is by tweaking (add, pause, delete, change bids) last click keywords. But this is sub optimal way of optimizing a PPC campaign as you are optimizing only a small part of the conversion process.

So in order to strengthen your PPC campaigns you also need to bid on keywords that initiate or assist conversions.  In this way you can stay away from the point of diminishing returns and remain within your CPA targets much longer.

 

Law of diminishing returns and Last Click CPA Optimization

The CPA that you see in your Google Adwords report or Google Analytics report is not your actual cost per acquisition.  Sorry to disappoint you. It is the cost per last click conversion. So if you ignore first and middle click keywords and optimize PPC campaigns on the basis of cost per last click conversions than you won’t get optimal results and sometimes even loose money.

This is because if a keyword is not completing a sale, it may be initiating a sale or assisting a sale and if you stop bidding on it because its cost per last click conversion (the so called CPA reported by Google Adwords) is too high or it is not completing any conversion then you may lose money.

Please see the related post for more details: Attribution Modeling Case Study – Introducing Effective Click Optimization

 

80/20 Rule

According to Pareto Principal (also known as the 80–20 rule), 80% of the effects come from 20% of the causes which means:

1. 80% of your sales come from 20% of your visitors. 
2. 80% of your output come from 20% of your input
3. 80% of your sales come from 20% of your products
4. 80% of your profit come from 20% of your products

So what you need to do is, to determine those 20% of everything and work relentlessly on them. You can’t sell each and every product of your client in each and every location of your country, so why spread your marketing efforts and resources too thin by trying to be visible everywhere for everything you sell.

Let us suppose that your target market is US. So your average customer can be anywhere from US. Let us also suppose that after analyzing one year data, you found out that people from New York City bought 2 times more than an average visitor to your website. They tend to spend 30% more than average per order. So now you know where your best customers live.

 

New York City Clients (Best Clients)

Average Clients (can be anywhere from US)

No. of transactions/year

4000

2000

Average order value

$70

$40

Total Revenue

$280000

$80000

Total Spend

$36000

$50000

Gross Profit

$244000

$30000

Your cost per acquisition will be high if you target whole of the US through search marketing or any other ad campaigns. So it is pretty obvious that your total spend is going to be higher for acquiring average clients.

So by directing your marketing efforts in acquiring more profitable clients, you can increase your revenue and profit even without increasing traffic or spending more on content creation and marketing.  Now the big questions that comes up is, why people from New York City are our best clients, what they are purchasing and what we can do, so that they buy more. If you can get answer to these questions, you can increase your sales within few weeks without increasing website traffic or spending more on content marketing. And this should be our aim as a marketer. I hope this answers my question: Why best customers generate more profit than average or low value customers?

 Please see the related post for more details: How to use Web Analytics 2.0 to improve your conversions

 

Statistical Significance and Marketing Decisions

Statistically significant result is the result which is unlikely to have occurred by chance. Statistically insignificant result is likely to have occurred by chance. So when someone says “is your result statistically significant?” then it means he is really asking “What is the likely hood that your result has not occurred by chance”.

Consider the following hypothetical scenario:

 

Visits

Transactions

E-Commerce Conversion Rate

Campaign A

1820

150

8.25%

Campaign B

20

4

19.25%

Campaign C

780

41

5.24%

 

Do you think you should be investing more in campaign ‘B’ because its conversion rate is highest?

I would suggest, not. The sample size in case of campaign ‘B’ (4 transactions out of 20 visits) is too small to be statistically significant. Had campaign B got 1 transaction out of 1 visit, it conversion rate would be 100%. Will that make its performance even better? No.

Do you think you should now be investing in campaign ‘A’ because it has higher conversion rate?

Are you really sure that the difference between the conversion rates of campaign ‘A’ and Campaign ‘C’ is statistically significant.?

 

In order to determine whether the difference is statistically significant or not, you need to conduct a statistical test (like Z test) to calculate the ‘confidence’  that difference in the conversion rates of the two campaigns is statistically significant. I am not talking about every day confidence, but this statistical confidence:

I am the confidence you need, to play with statistical significance

It is the confidence that the result has not occurred by a random chance. Statistical significance can be considered to be the confidence one has in a given result.  Confidence depends upon the signal to noise ratio and the sample size. So confidence that the result has not occurred by a random chance is high if signal is large and/or sample size is large and/or noise is low.

 

Let us assume that after conducting a statistical test we came to the conclusion that the difference in the conversion rates of the two campaigns can’t proved to be statistically significant. Under these circumstances we cannot draw the conclusion that campaign ‘C’ is not performing better.  So what we can do then. Well we need to collect more data to compute statistical significance of the difference in the conversion rates of the two campaigns. At this stage investing more money in campaign ‘A’ may not produce optimal results as you may think it will.

You can see yourself conducting more such statistical test as your statisical knowledge increases.

 

Data Segmentation and Inference

The statistical conclusion is known by the technical name of  ‘Statistical inference‘. The Statistical inference is the process of drawing conclusions from data which is subject to random variation.  One example of statistical inference is observational errors. You assumed that conversion rate of campaign ‘B’ is highest only on the basis of your observation. This is your statistical inference which is wrong.

Statistical inferences are often drawn from random sample taken from a set of entities (values, potential measurements). This set of entities is known as statistical population.  The set of campaigns above is an examples of statistical population from which statistical inferences (like which is the highest performing campaign) are drawn. The subset of statistical population is called sub population.

For example:

if you consider a PPC campaign as statistical population then its ad groups can be considered as sub populations.  To understand the properties of statistical population, statisticians first separate the population into distinct sub populations (provided they have distinct properties) and then try to understand the properties of individual sub-populations.

For the same reason, analytics experts recommends to segment analytics data before you draw statistical inferences from it. So if you want to understand the performance of a PPC campaign, then you should first try to understand the performance of its individual ad groups.

Similarly if you want to understand the performance of an ad group you should first try to understand the performance of the keywords and ad copies in that ad group. I hope it is clear now, why data segmentation is so important in web analytics.

Please see the related posts for more details:

 

Correlation and Causation

Correlation measures relationship between two variables. Let us suppose ‘A’ and ‘B’ are two variables. If  as ‘A’ goes up, ‘B’ goes up then ‘A’ and ‘B’ are positively correlated.  However if as ‘A’ goes up, ‘B’ goes down then ‘A’ and ‘B’ are negatively correlated.  For example:

 

Here Pages/visit is increasing over time but the Goal conversion rate is going down. So here the user engagement negatively correlates with conversions. When a user engagement negatively correlates with conversion then the engagement becomes distraction. I have talked more about this distraction in the post: How to separate User Engagement from Distraction

You need to look out for such negative correlations. The guy above (with the grumpy boss) is busy focusing on the conversion rate. He probably didn’t realize that the conversion rate may be negatively correlated with revenue and so instead of focusing on conversion rate, he should be focusing on improving revenue.

 

Correlation Coefficient

The correlation coefficient is used to measure the strength of the correlation. Its value ranges from -1 to 1.

-1 means perfect negative correlation. 0 means no relationship exist between the two variables. 1 means perfect positive correlation.

 

What is Causation

Causation is the theory that something happened as a result. For example, rise in temperature increases the sale of cold drinks.

 

Correlation doesn’t imply causation

You can always find some relationship between two variables/events if you really want to.  However mere presence of relationship between two variables/events doesn’t imply that one causes the other.

For example in the graph above there seems to be a relationship between pages/visit and conversion rate. But we can’t conclude that increase in user engagement has resulted in decrease in conversion rate at this stage without more analysis.

 

ROI Calculations for SEO

Types of SEO ROIs

1. Anticipated ROI
2. Actual ROI – Immediate
3. Actual ROI – long term

Anticipated SEO ROI
= (Anticipated Revenue from SEO efforts – Proposed Cost of the SEO Project)/proposed cost of the SEO project (measured in percentage)

The three things that you need to know in advance in order to calculate Anticipated ROI:
1. Average monthly visits
2. Average order value
3. E-commerce conversion rate of the website

Actual SEO ROI (Immediate)

= (Total E-Commerce Revenue through SEO + Total Goal Value through SEO) – Total cost of running the SEO campaign/ Total cost of running the SEO campaign (measured in percentage)
Here,
Total Goal Value = Assisting Conversion Value + Last Interaction Conversion Value

 

Actual SEO ROI (long term)
= Immediate ROI*12 (measured in percentage)
{client will continue to get seo benefits at least for the next one year even without any SEO}

 

What does following ROIs mean?
i. ROI of 0%
ii. ROI of 100%
iii. ROI of 1000%
iv. ROI of -100%

ROI of 0% => It means no profit, no loss. You spent ‘x’ and earned ‘x’ in revenue.
ROI of 100% => It means you spent ‘x’ and earned ‘2x’ in revenue.
ROI of 1000% => It means you spent ‘x’ and earned ‘11x’ in revenue.
ROI of -100% => It means you spent ‘x’ and earned 0 in revenue.

 

Proposed Sale by SEO

It is your total SEO fees which you promise to return back to your client by generating twice the amount of sales on his website. If you generate sales which is less than your total SEO fees, then your client will get a negative return on his investment. If you generate as much sales as your total SEO fees, then your client will get 0% return on his investment. In order to generate a reasonably positive ROI (like 100% ROI or more), you must generate sales which is at least twice the amount of your total SEO fees.

 

Additional orders required to generate proposed Sale

No. of orders required for proposed sale = proposed sale/average order value


For example, if proposed sale = $20000 and average order value is $100.
Then No. of orders required for proposed sale = $20000/$100 = 200

 

Additional traffic required to generate proposed sale

Additional traffic required to generate proposed sale = no. of orders required to generate proposed sale/ e-commerce conversion rate

 Please see the related posts for more details:

 

Now it is your turn. How do you use statistics in your analysis? Please share your thoughts and insights.

Other Posts you may find useful: Analyze backlinks through Link Building Dashboards

 

Join over 5000 subscribers!
Receive an update straight to your inbox every time I publish a new article.

 

About the Author:



My business thrives on referrals, so I really appreciate recommendations to people who would benefit from my help. Please feel free to endorse/forward my LinkedIn Profile to your clients, colleagues, friends and others you feel would benefit from SEO, PPC or Web Analytics.

 

 

  • Jim Bob

    Incorrect:
    “ROI of 1000% => It means you spent ‘x’ and earned ‘20x’ in revenue.”

    Corrected
    ROI of 1000% => It means you spent ‘x’ and earned ‘11x’ in revenue.

    • http://stuti90.tumblr.com Stuti Dhanvada

      ^ What he said.

      You should correct it!

      Loved the rest of the article.

    • seohimanshu

      Good catch.

  • http://www.facebook.com/art.morehead Art Morehead

    Once again over the top, your timing couldn’t have been better..

  • Rich Urban

    WOW, you are the man

  • http://twitter.com/randyzwitch Randy Zwitch

    For the “mean” section, it would be more accurate to just take the weighted average bounce rate as a function of pageviews/entries rather than throwing out your outliers.

    For example, what if your homepage was the 100% bounce rate page and represented 80% of your entry pages? It’s hard to argue that using the home page in your calculation represents an outlier.

  • Nitin Choley

    Awesome post
    unable to view the Video in India

    • seohimanshu

      Sorry about that :(

  • Alex Brown

    Wonderful education piece, Himanshu!

    • seohimanshu

      Thanks Alex

  • preeti prakash

    thanks for sharing the information which i was searching from couple of days . Expecting more from you on the same topic.

    • seohimanshu

      I will be coming up with a new post soon on similar topic soon.

  • http://www.canuckseo.com/ Jim Rudnick

    Basic. Elementary. Building blocks that count. All of these are true, Him….and yet from my own client list I know that a full 50% of same do not know any of this. What ARE they teaching in high school would be my first question…
    But yes, great piece here!!!!

    • seohimanshu

      Thanks for the kind words. I think people forget most of the stuff they learned in school once but which they don’t use in their daily lives. Blame it to information overload.

  • Rahul Kharnokhya

    Himashu, This is really a great post and as being a digital marketing manager I am not aware about some facts. Thanks for sharing such a nice post. I am regular reader of your blog and expect some more insights about digital analytics. Please share some posts or blogs which are worth to read about digital or predictive analytics.
    Thanks again dear!

  • Kishore

    Hey dis is really wonderful information that i am looking for.Thank you so much..Can u suggest some other blog links that explain usage of statistical techniques in web analytics

    • seotakeaways

      I am not sure about any other blog which explain usage of statistical techniques in web analytics.