How to: directly query data from DuckDB

Using dcwatch script

The dcwatch.py python script allows to run quick actions to read or write in the dataset locally.

Ensure you have the required libraries installed then simply run your queries with the sql command for read only actions, such as:

./dcwatch.py sql "SELECT id, name, city_name FROM datacenters"

./dcwatch.py sql "SELECT * FROM companies"

To write data in the dump folder (to prepare a merge request for example), run the sqldump command:

./dcwatch.py sqldump "INSERT INTO companies (id, name) VALUES (789, 'New datacenter company that didn't exist)"

Using Python code

First you need DuckDB installed.

pip3 install duckdb

Then import the database schema by running the schema.sql script:

import duckdb

with open("schema.sql", 'r') as data:
    script = data.read()
    duckdb.sql(script)

Or simply import the functions available in the lib module in the repository:


from main import load_schema

load_schema()

Then import the data from the dump folder by running the data.sql script:

with open("data.sql", 'r') as data:
    script = data.read()
    duckdb.sql(script)

Or simply import the functions available in the lib module in the repository:


from main import load_schema, load_data

load_schema()
load_data()

DuckDB (the library to manage the data as an SQL schema behind the scenes) works only in memory by default. To actually write changes on disk, in the dump folder, use the dump_data function:


from main import load_schema, load_data, dump_data
import duckdb

load_schema()
load_data()

# do something
duckdb.sql("INSERT INTO companies (id, name) VALUES (789, 'New datacenter company that didn't exist)")

# save on disk
dump_data()

Using the SQL scripts behind the scenes

This is less direct and more prone to errors, but so you know its there:

schema.sql tells DuckDB the schema and constraints to consider before importing data from the existing CSV files, or editing data in-memory. See more on the database schema here.

data.sql tells DuckDB where to find the relevant data, already stored in CSV files, for each table in the schema.

COPY progress_steps FROM 'dump/progress_steps.csv';
COPY land_plots FROM 'dump/land_plots.csv';
COPY buildings FROM 'dump/buildings.csv';
COPY campuses FROM 'dump/campuses.csv';
COPY companies FROM 'dump/companies.csv';
COPY cooling_technologies FROM 'dump/cooling_technologies.csv';
COPY countries FROM 'dump/countries.csv';
COPY datacenters FROM 'dump/datacenters.csv';
COPY estimations FROM 'dump/estimations.csv';
COPY 'groups' FROM 'dump/groups.csv';
COPY log_table FROM 'dump/log_table.csv';
COPY electricity_generation_source FROM 'dump/electricity_generation_source.csv';
COPY datacenter_categories FROM 'dump/datacenter_categories.csv';
COPY company_belongs_to_group FROM 'dump/company_belongs_to_group.csv';
COPY company_operates_datacenter FROM 'dump/company_operates_datacenter.csv';
COPY datacenter_is_composed_of_buildings FROM 'dump/datacenter_is_composed_of_buildings.csv';
COPY datacenter_is_in_campus FROM 'dump/datacenter_is_in_campus.csv';
COPY datacenter_is_in_categories FROM 'dump/datacenter_is_in_categories.csv';
COPY datacenter_uses_cooling_technologies FROM 'dump/datacenter_uses_cooling_technologies.csv';
COPY datacenter_uses_electricity_generation FROM 'dump/datacenter_uses_electricity_generation.csv';
COPY estimation_has_been_made_for_datacenter FROM 'dump/estimation_has_been_made_for_datacenter.csv';

SELECT * FROM datacenters;

Keyboard shortcuts

DCWatch documentation

How to: directly query data from DuckDB

Using dcwatch script

Using Python code

Using the SQL scripts behind the scenes