How to Load Data from Tables in Pandas to BigQuery with Pandas

Matías Salinas
2 min readMar 12, 2023

--

Google BigQuery is a fully managed cloud-native data warehouse that allows you to store and analyze large amounts of data quickly and easily. In this article, we’ll discuss how to load data from Pandas to BigQuery, using the Pandas GBQ library.

Prerequisites

Before we begin, you’ll need:

  • A Google Cloud Platform (GCP) account
  • A BigQuery project and dataset
  • Python installed on your local machine
  • The Pandas, Pandas GBQ, and google-cloud-bigquery libraries installed

Installation

To install the required libraries, run the following commands:

pip install pandas pandas-gbq google-cloud-bigquery

Code Example

Let’s say you have a CSV file named “data.csv” that you want to load into a Pandas DataFrame and then into a BigQuery table. Here’s an example of how you can do this:

import pandas as pd
from google.cloud import bigquery

# Authenticate and create a BigQuery client
client = bigquery.Client()

# Load the CSV file into a Pandas DataFrame
df = pd.read_csv('data.csv')

# Define the BigQuery table schema
schema = [
bigquery.SchemaField('column1', 'STRING'),
bigquery.SchemaField('column2', 'INTEGER'),
bigquery.SchemaField('column3', 'FLOAT')
]

# Create the BigQuery table
table_id = 'my_dataset.my_table'
table = bigquery.Table(table_id, schema=schema)
table = client.create_table(table)

# Insert the data from the DataFrame into the BigQuery table
job_config = bigquery.LoadJobConfig()
job_config.source_format = bigquery.SourceFormat.CSV
job_config.skip_leading_rows = 1
job_config.schema = schema
job = client.load_table_from_dataframe(df, table_id, job_config=job_config)
job.result() # Waits for the job to complete.

print(f'Loaded {job.output_rows} rows into {table_id}.')

In this example, we first authenticate and create a BigQuery client using the google.cloud.bigquery library. We then load the CSV file into a Pandas DataFrame using the pd.read_csv() function. Next, we define the BigQuery table schema and create the table using the client.create_table() method. Finally, we insert the data from the DataFrame into the table using the client.load_table_from_dataframe() method.

Conclusion

Loading data from Pandas to BigQuery is a simple process using the Pandas GBQ library. With this method, you can easily store and analyze large amounts of data in BigQuery with the power and flexibility of Pandas.

--

--

Matías Salinas
Matías Salinas

No responses yet