How to Import CSV into PostgresSQL

Airplane is the developer platform for building custom internal tools. You can transform scripts, queries, APIs, and more into powerful workflows and UIs that anyone on your team can use. It also offers essential features that streamline task management and facilitate seamless integration with databases and other existing systems.

The basic building blocks of Airplane are Tasks, which are single or multi-step functions that anyone on your team can execute. A Task can use SQL queries, code, or API calls.

This tutorial focuses on importing data from a CSV file into a PostgreSQL database. It explores different methods, including SQL queries and Airplane Tasks, and provides a guide on creating and executing an Airplane Task for seamless data upload.

Use cases for CSV file uploads

Before diving into the tutorial, let's take a closer look at comma-separated values (CSV) files. These are plain text files with the .csv file extension that contain data records, usually separated by a comma. They are typically used to store and exchange structured information, especially between programs that can open a CSV file.

Individuals and organizations import CSV files for various reasons, including:

Bulk data import: CSV files allow multiple simultaneous data uploads. For example, Shopify permits merchants to import a large number of products into their Shopify store at the same time using a CSV file. It requires less time and effort compared to adding the items one after another.
Data migration: This involves data transfer from one storage system to another. CSV files facilitate seamless data transfer. For instance, when carrying out data storage equipment replacements, maintenance, or upgrades, CSV files can store and preserve the data until the storage device is available.
Data integration: CSV files also allow you to gather data from various sources into a single repository. For example, you could import employee data in a unified format from individual branches of a multinational company into a central database. This process provides multiple benefits, including streamlining data analysis and other business processes.

CSV files offer flexibility and efficiency when working with data. They simplify data handling, including bulk transfer and data synchronization. CSV is also a popular and widespread data format and the preferred choice for various data-related tasks.

Importing CSV files into PostgreSQL

There are different methods for importing CSV data into a PostgreSQL database. The following tutorial demonstrates how to create a table in the database and populate it with the CSV data using the pgAdmin GUI, SQL commands, and Airplane Tasks.

Prerequisites

This tutorial requires a good understanding of Postgres, PostgreSQL queries, and CSV files. It also uses the following:

An employees.csv file: Create your own or download this sample file.
A local PostgreSQL database: Follow this tutorial to create one.
A hosted PostgreSQL database: Build a PostgreSQL database instance using the Amazon Relational Database Service (Amazon RDS). Make sure to note the master password, as Amazon RDS does not display it with the other database connection details.
An active Amazon S3 bucket: Follow the official documentation to create an AWS account and an S3 bucket.

Using pgAdmin

pgAdmin is a GUI that allows users to interact with the PostgreSQL database. Install pgAdmin from the official website or as a component when setting up PostgreSQL:

Creating the table

In this section, we'll add an employees table with the columns firstname, lastname, department, and city to the PostgreSQL database.

It's important to note that the CSV file and PostgreSQL table must have the same name. They must also have matching column names.

pgAdmin displays a list of Postgres servers, their databases, and other menu options on the left side of the screen. To create a table, click the database to expose a dropdown menu.

Go to Schemas > public > Tables and right-click Tables to reveal another dropdown menu. Then, go to Create > Table to open a dialog:

In the dialog, set the table name under the General tab to employees:

Next, add individual fields for firstname, lastname, department, and city under the Columns tab. Each one should have the character varying data type and a character length of 255:

Confirm the input and click Save to close the dialog:

We can find the new employees table in the Tables dropdown menu.

Importing the CSV data

We'll now populate the new table with data from the CSV file. First, right-click the employees table and select Import/Export Data to reveal another dialog:

The dialog displays the import/export settings. This section will focus on the basic import settings.

Under the Options tab:

Select Import.
Enter the path to the CSV file or browse the local machine.
Set the format to csv.
Turn on the Header option to separate the header row.
Click OK to close the dialog.

A notification will pop up in the lower right corner, showing the details and status of the import:

To confirm, right-click the table and go to View/Edit Data > All rows to output all entries in the table:

The employees table should now have the same number of rows as the CSV file.

Importing a CSV file using an SQL command

We can also import the CSV data with an SQL command. This section uses the previously created database to demonstrate CSV file importation via the SQL COPY command.

First, delete the data in the employees table using the SQL TRUNCATE statement:

pgSQL

TRUNCATE is a fast way of removing all existing rows in a table without disturbing the structure.

Run the following command to confirm that the table is empty:

pgSQL

It should return 0 rows.

Next, import the CSV file using the PostgreSQL COPY query:

pgSQL

This query copies data between the file and the table. In the code above:

The columns firstname, lastname, department, and city are specified to receive the corresponding data from the file.
FROM specifies the location of the file. Replace /path/to/file/employees.csv with the correct path to employees.csv on your local machine.
DELIMITER ',' indicates that the comma character is the field delimiter in the CSV file.
The CSV option means that the file is in CSV format.
HEADER specifies that the first row in the file contains the column headers.

Run the last SELECT statement again. It should now return the same number of rows as the CSV file.

Using an Airplane SQL Task

If you don't have one already, create an Airplane account and install the Airplane CLI on your system. This section will cover the steps required for successful task execution.

Uploading the CSV File

Upload the employees.csv file to the S3 bucket we created in the prerequisites section using the AWS S3 console:

Next, enable public read access to the uploaded object to ensure the database can access the file:

Finally, confirm that the file is publicly accessible by copying the object URL from the Amazon S3 console and running it in the browser:

For example, visiting the object URL https://prac-s3.s3.eu-central-1.amazonaws.com/employees.csv should successfully download an employees.csv file.

Connecting to the Amazon RDS PostgreSQL Instance

Airplane accepts different database types, including PostgreSQL, via the Resources dashboard. We can integrate the hosted database into the Airplane account by following the steps in this section.

Open the Airplane dashboard and click Settings on the left navigation bar:

Go to Developer settings > Resources, then click New resource at the top-right corner of the Resources page:

Select PostgreSQL under Databases:

We'll then provide the details required to connect to the database, including a human-readable name for the resource and database connection details. Follow the instructions in the AWS documentation to retrieve the database connection details, then enter them:

Test the connection to confirm that it works before saving the details:

Once saved, the new resource will appear on the Resources page, along with its type, name, and slug:

Configuring the Amazon RDS Instance

The database requires the aws_s3 extension to communicate with the S3 bucket. In this section, we'll install the extension and create an employees table in the integrated database. First, set up the database to receive data from S3.

Run the following to open the psql terminal:

bash

Install the aws_s3 extension to Postgres:

pgSQL

Run the \dx command to confirm that the extension exists for the database:

Next, run the following to create the employees table:

pgSQL

Creating an SQL Task

In Airplane, SQL Tasks execute SQL queries to perform various operations on SQL databases. We'll create an SQL Task to populate the employees table with data from the CSV file. We should use a new, empty directory when creating a new SQL Task in Airplane.

Create a file directory called airplane-getting-started:

bash

Navigate to the new folder:

bash

Then, initiate the process by running the init command:

bash

The init command presents some prompts for details, including the task name and type. The responses will determine the structure of the project.

Name the task csv_upload and choose SQL as the task type:

The airplane-getting-started folder should now have two new files (a script file and a config file):

The config file is a .task.yaml file that defines the SQL task, including the task name, task description, database resource, and the path to the SQL script. It is prefilled with the task name and slug.
The SQL script is a separate file with the .sql extension and contains all the SQL statements needed for this task.

Confirm this by running the ls (macOS) or dir (Windows) command in the terminal:

Configuring the SQL Task

Before testing the created task, we must update the config file with the specific configuration details required for successful execution. These include what database resource to use and any necessary environment variables. The Airplane documentation highlights all supported SQL Task configuration fields and their purposes.

First, add the database resource slug from the Resources page to specify the target database:

Airplane automatically generates the slug using the assigned resource name. Assign the slug value to resource under the SQL configurations:

yaml

We'll also need the S3 bucket name and AWS region from the Amazon S3 console:

Insert both into the sql.queryArgs object:

yaml

This task needs AWS credentials to access the S3 bucket. This data is sensitive and should be kept confidential.

Follow the instructions from the AWS documentation to retrieve the access key ID and secret access key. Then, go to Settings > Developer settings > Config variables on the Airplane dashboard and create new config variables:

On the New config variable screen, insert AWS_ACCESS_KEY under the Name field and the AWS access key ID under the Value field. Click the Secret toggle to encrypt the value, then click Create to save the new config variable:

Repeat the above to create another config variable with AWS_SECRET_KEY as the config name and the AWS secret access key as the value:

Attach the AWS_ACCESS_KEY and AWS_SECRET_KEY variables to sql.configs:

yaml

Set the variables as query parameters for the SQL script:

yaml

Here is the complete configuration setup for this task:

yaml

Updating the SQL script

With the configuration details in place, let's now add the SQL SELECT query to the SQL script. This script determines the database operations that the task executes.

Populate the .sql file with the following:

pgSQL

The SELECT statement imports the S3 file into the employees table with the aws_s3.table_import_from_s3 function.

The function accepts the following arguments:

table_name is the name of the Postgres table (employees).
column_list is a string of the columns to import ('firstname,lastname,department,city').
options are PostgreSQL COPY arguments, where '(format csv, header true)' indicates a CSV file with a header line.
:s3_bucket_name refers to the S3 bucket name under sql.queryArgs.
file_path is the object key (/employees.csv).
:s3_region is the S3 region under sql.queryArgs.
:aws_access_key and :aws_secret_key reference the AWS credentials in the config file under sql.queryArgs.

Executing the Task in Airplane Studio

Confirm that the task configuration works as expected by testing it locally before deploying the SQL Task. Airplane Studio provides a rich local development environment for this purpose. The Studio mimics the Airplane app, allowing us to set up resources and configs and execute tasks locally, among other features.

Run this command to start the local dev server:

bash

In this command, csv_upload.task.yaml specifies the task configurations to register with the Studio, while the --env prod flag specifies the fallback environment. The task uses resources and config variables from the production environment.

The output should look similar to the following:

Copy the provided link from the terminal and paste it into a web browser to launch the Studio, or press the Enter key.

The Studio has a Configure tab for updating and verifying the task's configuration details. Changes made here while the Studio is active will automatically reflect in the config file.

Click the Execute task button to run the query:

The action should return a successful response:

Run the following in the psql terminal to confirm the imported data:

pgSQL

Executing the Task via the dashboard

Run the following command to deploy the task:

bash

The airplane deploy command deploys all tasks in the current directory to Airplane. It confirms the target environment before deploying and returns a link after that opens the deployment on the Airplane dashboard:

Go to the Library and click the deployed task:

On the task page, click Execute task to run it:

When the task is running, we should see a successful response, similar to the one shown in the Studio:

Airplane Tasks are an efficient way to interact with a database for many reasons. For example, we can make use of group- or user-based permissions that control task access and execution. Teams can employ this to enforce alignment with internal database security policies.

Airplane also supports the automated execution of recurring database operations via its Schedules feature. We could use a schedule to occasionally update a table with CSV file data. The database will stay effortlessly synchronized with the file and reflect updates over time.

Guidelines for working with CSV files

Working with CSV files requires specific techniques for optimal results. They include the following:

Encode the CSV file in UTF-8: The UTF-8 format is currently one of the most popular encoding systems. It preserves the integrity of the original CSV data across multiple platforms as it supports almost every character, including special characters.
Add a header row with descriptive titles: A header row must be the first line in the file. This row clarifies what each column represents, improving data interpretation.
Choose a delimiter that does not conflict with the CSV file data: Despite being one of the most common delimiters, the comma (,) may not always be the most suitable choice. For instance, using commas as a delimiter can lead to data misinterpretation if the file has large numbers that use commas. Other delimiter options include the semi-colon (;) and pipe (|).
Define the boundaries of column values with quotation marks: Quotation marks wrap together text that should be kept as one value, especially text that contains special characters, delimiters, or line breaks. For example, we should use quotation marks for a field value with embedded commas in a CSV file with the comma delimiter.

Conclusion

This tutorial covered various methods of importing CSV files into a PostgreSQL database, including Airplane Tasks. Airplane provides tools that improve team collaboration and productivity. By following the step-by-step instructions, you successfully executed a CSV file upload and can see the benefits that Airplane Tasks offer over other methods.

To try it out yourself and build your first Airplane Task, sign up for a free account or book a demo.

Author: Fiyin Akinsiku

Fiyin is a backend engineer who is continually impressed by the many ways technology makes life easier. Fiyin currently works as part of a fintech company and enjoys writing.

How to import a CSV into PostgreSQL

Use cases for CSV file uploads

Importing CSV files into PostgreSQL

Prerequisites

Using pgAdmin

Creating the table

Importing the CSV data

Importing a CSV file using an SQL command

Using an Airplane SQL Task

Uploading the CSV File

Connecting to the Amazon RDS PostgreSQL Instance

Configuring the Amazon RDS Instance

Creating an SQL Task

Configuring the SQL Task

Updating the SQL script

Executing the Task in Airplane Studio

Executing the Task via the dashboard

Guidelines for working with CSV files

Conclusion

An Internal.io user's guide to Airplane

How to set up and manage Python cron jobs