Guide to BigQuery Cost optimization

BigQuery Cost optimization

Google BigQuery can get expensive pretty fast if you are dealing with terabytes or petabytes of data every day and you do not construct your queries properly or pull too much data too frequently.

Your monthly cost of using BigQuery depends upon the following three factors:

1) The cost of connecting your Google Analytics account to BigQuery

2) The amount of data you stored in BigQuery (i.e the storage cost)

3) The amount of data you processed by each query you run (i.e. the query cost)

Keep the following points in mind when doing BigQuery cost optimization:

#1 Before you query the data from a table, check the size of the table

If the size of the data table is just a few kilobytes (KB) or megabytes (MB) then you don’t need to worry: 

But if the table size is in gigabytes (GB) or terabytes (TB) then you should be careful how you query your data:

#2 Before you query the data from a table, preview the table 

Many people esp. the new users run queries just to preview the data in a data table. This could considerably cost you if you accidentally queried gigabytes or terabytes of data. 

Instead of running queries just to preview the data in a data table, click on the ‘Preview’ tab to preview the table.

There is no cost for previewing the data table. 

The table preview will give you an idea of what type of data is available in the table without querying the table:

#3 Always keep an eye on how much data your query will process before you run your query

If your query is going to process only kilobytes or megabytes of data then you don’t need to worry:

However, if your query is going to process gigabytes or terabytes of data then it could considerably cost you. If that’s the case then query only that data which is absolutely necessary:

#4 Your query cost depends on the number and/or size of the columns returned

Returning 10 rows/records is going to cost you the same as returning 10,000 records of data:

So the number of rows/records your query returns doesn’t affect your query cost.

Your query cost is affected by the number of columns your query returns: 

Following is an example of a query which would return one column named ‘id’:

Following is an example of a query which would return two columns named ‘id’ and ‘creation_date’:

Note how just by adding a second column, the query size is increased by 1 GB. 

Now, what would happen if we write a query which returns all the columns of the table?

So if we try to return all the columns of this data table then 99.7 GB of the data would be processed.

So only query the columns you really need. 

Your query cost is affected by the size of each column.

The query below returns one column named ‘id’: 

The query below returns one column named ‘body’:

Note how the size of the query increased from 236 MB to 26.3 GB.

So you need to be very careful about the size of the column you want to retrieve. 

#5 Avoid using SELECT *

SELECT * means returns all the columns of the data table. 

Now if your data table contains a lot of columns and some of the columns are very big in size (maybe in GB or TB) then using SELECT * could considerably increase your query cost.

So the best practice is to avoid using SELECT *

#6 Applying a LIMIT clause to a SELECT * query does not affect the query cost

This is because the LIMIT clause controls the number of rows/records your query returns. But as you know by now the number of rows/records your query returns doesn’t affect your query cost.

With the LIMIT clause:

Without the LIMIT clause:

#7 Set up Budget alerts

Set up cloud billing budgets and budget alerts which trigger email notifications to billing admins and/or project manager when your costs (actual costs or forecasted costs) exceed a percentage of your budget (based on the threshold rules you set).

These email alerts inform you of how your usage costs are trending over time.

Note: Setting up a budget does not automatically cap Google Cloud usage or spending.

For more information on setting up cloud billing budgets and budget alerts, check out the official help documentation from Google: https://cloud.google.com/billing/docs/how-to/budgets

#8 Set up Quota limits

You can turn on cost control at a project level or user level by setting up/customizing quota limits.  That way can put a cap on the maximum number of bytes processed per day by a given user or project.

When the user/project exceeds their quota limit, the query will not be processed and a “quota exceeded” error message would be displayed’.

To learn more about working with Quotas, check out the official help documentation from Google: https://cloud.google.com/docs/quota

#9 Regularly monitor your spending

At least once a week, visit the ‘Billing’ section of your Google Cloud platform account to see how much you have spent so far:

#10 Use the Google Cloud pricing calculator

The Google Cloud pricing calculator is used to estimate the storage cost and/or the cost of running your desired query before you actually run it. 

However, this calculator works only when you are querying the data in terabytes or petabytes.

1 petabyte (PB) = 1000 terabytes (TB)

Follow the steps below to use the Google Cloud pricing calculator:

Step-1: Use the query validator in the cloud console to calculate the amount of data that will be processed when you run the query:

Step-2: Navigate to the Google Cloud Pricing calculatorhttps://cloud.google.com/products/calculator

You should now see a screen like the one below:

Step-3: Click on the first field:

Step-4: Select ‘Big Query’ from the drop-down menu:

Step-5: Click on the ‘FLAT-RATE’ tab if the flat-rate pricing applied to your billing account. Otherwise, move on to the next step:

Step-6: Enter the name of your data table:

Step-7: Enter the location from where the BigQuery will be used:

Step8: Use the following configuration for ‘storage pricing’ as we are not estimating the storage cost:

Step9: Enter the amount of data that will be processed when you run the query. In my case, it would be 1.1 peta bytes (PB):

Step-10: Click on the ‘ADD TO ESTIMATE’ button.

You should now be able to see the cost estimated on the right-hand side of your screen:

#11 Use the “BigQuery Mate” chrome extension

BigQuery Mate is a Google Chrome extension through which you can estimate query cost within the query validator. 

Unlike the Google Cloud pricing calculator, this extension works even when you are querying less than 1 terabyte of data.

To use this extension follow the steps below:

Step-1Download the BigQuery Mate chrome extension.

Step-2: Refresh your Big Query console by clicking on the browser refresh button.

Step-3: Type your query in the ‘Query Editor’. You should now be able to see the query cost estimates on the right-hand side of your query validator:

To learn about using Google Big Query for Google Analytics, check out this article: Google Analytics Bigquery Tutorial

Register for the FREE TRAINING...

"How to use Digital Analytics to generate floods of new Sales and Customers without spending years figuring everything out on your own."



Here’s what we’re going to cover in this training…

#1 Why digital analytics is the key to online business success.

​#2 The number 1 reason why most marketers are not able to scale their advertising and maximize sales.

#3 Why Google and Facebook ads don’t work for most businesses & how to make them work.

#4 ​Why you won’t get any competitive advantage in the marketplace just by knowing Google Analytics.

#5 The number 1 reason why conversion optimization is not working for your business.

#6 How to advertise on any marketing platform for FREE with an unlimited budget.

​#7 How to learn and master digital analytics and conversion optimization in record time.



   

My best selling books on Digital Analytics and Conversion Optimization

Maths and Stats for Web Analytics and Conversion Optimization
This expert guide will teach you how to leverage the knowledge of maths and statistics in order to accurately interpret data and take actions, which can quickly improve the bottom-line of your online business.

Master the Essentials of Email Marketing Analytics
This book focuses solely on the ‘analytics’ that power your email marketing optimization program and will help you dramatically reduce your cost per acquisition and increase marketing ROI by tracking the performance of the various KPIs and metrics used for email marketing.

Attribution Modelling in Google Analytics and Beyond
Attribution modelling is the process of determining the most effective marketing channels for investment. This book has been written to help you implement attribution modelling. It will teach you how to leverage the knowledge of attribution modelling in order to allocate marketing budget and understand buying behaviour.

Attribution Modelling in Google Ads and Facebook
This book has been written to help you implement attribution modelling in Google Ads (Google AdWords) and Facebook. It will teach you, how to leverage the knowledge of attribution modelling in order to understand the customer purchasing journey and determine the most effective marketing channels for investment.

About the Author

Himanshu Sharma

  • Founder, OptimizeSmart.com
  • Over 15 years of experience in digital analytics and marketing
  • Author of four best-selling books on digital analytics and conversion optimization
  • Nominated for Digital Analytics Association Awards for Excellence
  • Runs one of the most popular blogs in the world on digital analytics
  • Consultant to countless small and big businesses over the decade
error: Alert: Content is protected !!