Slurping Data Quickly into Postgres
You might think "let's use SQL statements", like... one is supposed to, with a relational database...?
You can go faster though!
In psql
the command line tool, there is a command called \copy
. It can load up files in various formats, e.g. CSV, and turn it into a table for you.
Let's say you have a file with a ton of JSON documents, one per line. To load this up, you could do something like this:
create temp table the_temp (line text);
\copy the_temp from '/home/simon/local/some/path/file_with_lines.txt' CSV QUOTE E'\b' DELIMITER E'\x01';
create table the_actual_table;
insert into the_actual table (a_column_containing_json) select line::jsonb from the_temp;
We first create a table for the lines of text; we hack the CSV parser so it doesn't try to un-escape the JSON (if your delimiter is the 0x01
character, you're unlikely to find it anywhere... on the other hand, if it's not, your quotes might end up being eaten in the process).
And once it's a database table, you're free to process it further within postgres.
Unlike the COPY
command (the one without the backslash), this one does happen to work over the network & you don't need any fancy root / admin rights for it either.
And it's fast.
As for why we're doing this... behold the dual terrors of overly short posts and cliffhangers.