Understanding BigQuery ML: Definition, Applications, and Example Use Case

Matías Salinas
3 min readJul 2, 2023

--

What is BigQuery ML?

BigQuery ML (Machine Learning) is an integrated feature within Google Cloud’s fully-managed, serverless data warehouse service, BigQuery. It empowers users to create, train, and evaluate machine learning models using standard SQL language. This dramatically simplifies the process, eliminating the need to export data into specialized ML tools and making machine learning more accessible to SQL professionals.

What is BigQuery ML used for?

BigQuery ML finds its primary use in predictive modeling and machine learning tasks. It supports a range of models, including linear regression, logistic regression, k-means clustering, and time series forecasting, among others.

This tool is particularly advantageous for data analysts, enabling them to leverage their SQL skills to create machine learning models without needing to learn a new programming language or delve into complex ML frameworks. Additionally, BigQuery ML seamlessly integrates with other Google Cloud services, thereby providing a holistic platform for data analysis and predictive modeling.

BigQuery ML Use Case: Training, Evaluation, and Prediction

Let’s imagine we have a dataset in BigQuery comprising historical sales data for a retail company. We aim to predict future sales based on this data.

Training

We begin by training a linear regression model using our historical data. Here’s how you might accomplish this with BigQuery ML:

CREATE OR REPLACE MODEL `my_project.my_dataset.sales_forecast_model`
OPTIONS(model_type='linear_reg') AS
SELECT
total_sales AS label,
month,
day_of_week,
holidays
FROM
`my_project.my_dataset.sales_data`

The CREATE OR REPLACE MODEL statement generates a new BigQuery ML model. The model_type option is set to 'linear_reg' for linear regression. The AS SELECT clause specifies the data used to train the model.

Evaluation

Once the model has been trained, we can evaluate its performance using the ML.EVALUATE function:

SELECT
*
FROM
ML.EVALUATE(MODEL `my_project.my_dataset.sales_forecast_model`)

This query returns a series of evaluation metrics for the model, including Mean Absolute Error (MAE), Mean Squared Error (MSE), and R-Squared, which provide an indication of the model’s accuracy.

The output might look like:

+------+-------------------+-------------------+
| row | mean_absolute_err | mean_squared_err |
+------+-------------------+-------------------+
| 1 | 10.7893 | 234.5467 |
+------+-------------------+-------------------+

Prediction

Having evaluated and fine-tuned our model as required, we can now employ it to predict future sales using the ML.PREDICT function:

SELECT
month,
day_of_week,
holidays,
predicted_label
FROM
ML.PREDICT(MODEL `my_project.my_dataset.sales_forecast_model`,
(SELECT
month,
day_of_week,
holidays
FROM
`my_project.my_dataset.future_sales_data`))

In this query, ML.PREDICT applies the trained model to new data (here, future_sales_data), yielding predictions (predicted_label) for future sales.

The output might look like:

+-------+-------------+----------+------------------+
| month | day_of_week | holidays | predicted_label |
+-------+-------------+----------+------------------+
| 11 | 5 | 0 | 234.1234 |
| 12 | 4 | 1 | 456.7890 |
+-------+-------------+----------+------------------+

In summary, BigQuery ML presents an accessible, robust platform for data analysts to execute machine learning tasks within a familiar SQL environment, greatly enhancing their predictive modeling capabilities.

--

--

Matías Salinas
Matías Salinas

No responses yet