Redshift comprises of one or more nodes. Together these nodes form a Redshift Cluster. Cluster management is controlled through AWS Identity and Access Management services. Multiple database can be created in a single cluster.
Database and table permissions are handled just like any other Postgres database.
Redshift uses a variety of techniques to facilitate efficient storage and querying of petabytes of data.
- Redshift does not use indexes. Primary keys can be specified but are not enforced. Primary keys are used for query optimization only. This makes the writes to Redshift incredibly fast.
- To compensate for the absence of indexes, Redshift employs massively parallel processing (MPP) architecture, parallelizing and distributing SQL operations to nodes in a cluster. Redshift has a leader node that receives a query. Based on the query, the leader node requests data from 1..n nodes. The nodes run queries in parallel and returns results to the leader node that passes the results back to the client.
- Column data can be automatically compressed. Or the administrator can specify the most optimal compression scheme for each column.
- Redshift is a columnar store. A single column can be projected from a table without having to read any other column in each table row.
- Parameter Group defines global settings for all the databases created in a cluster.
Tools
- SQL Manager Lite tool from EMS Software
- SQL Workbench
- Postgresql ODBC driver
- Npgsql is the .NET data provider for Postgresql
- For Cluster Management - use AWS CLI
Best Practices - Cluster
- Create at least two nodes per cluster. If you use a single-node cluster and if the node fails, the only possible data recovery is from the last snapshot of the cluster.
- Be aware of the maintenance window. Once a week, a 30-minute window is specified when Redshift patches are applied. You can pick the time and day of week for this maintenance window.
- Take a snapshot of the cluster before performing any maintenance
- Make sure you can backup and restore cluster. Cluster is available during restore (not read-only). Cluster is in read-only mode during resize.
Best Practices - Security
- Use VPC security group that only allows All TCP Traffic on 5439 (port for Redshift) only. Require SSL for access.
- Security at database level works like any other relational database:
- Create user groups.
- Grant permissions to these groups.
- Associate users with groups.
Only provide "SELECT" permission to report users.
Joins
- Nested Loop (worst)
- Hash Join
- Merge Join (data resides on the same slice)
Best Practices - Loading Large Data Set
- COPY command is the fastest way to load data. Data can be loaded from S3, DynamoDB, and files (SSH).
- Do not use one big file (split into files and utilize hw better)
- When preparing data load file, do escape the delimiter in text fields. Replace CRLF/LF with a different character. If \ is used as the ESCAPE character, be sure to escape \ in the text fields (\\). When exporting from another database, remove non-printable characters from data (explicitly specify code page 1252 in SQL Server BCP utility).
- Use the NOLOAD keyword with a COPY command to validate the data in the input files before actually loading the data.
- Load a sample of data (a few million) with COMPUPDATE and STATUPDATE on. See what kind of compression encoding Redshift comes up with. Explicitly specify column encoding in the table CREATE. Drop and recreate table with new encodings.
- If possible, presort data in the big data file (use the same as the table sort key).
- When loading time series data, use time as the sort key and pre-sort the data file before load to speed up load.
- Split the big data load file into smaller files; Redshift can load files in parallel. If the file is too large, the whole process may abort abruptly without any reason (currently a single file containing over 250 million rows fails to load).
- When loading data from S3 into Redshift, be aware that S3 is eventually consistent. If there are a large number of files that are being uploaded into S3, they might become available in an order different the upload order. Use manifest file to be specify file (retries)
- Compress each file before uploading to S3
- Turn COMPUPDATE and STATUPDATE off during the load
- Run ANALYZE after the load.
- Update system stats after data load. Without the update, query optimization suffers.
- Check distribution of rows amongst slices after load (see query below)
- Don't clear and reload
Best Practices - Schema Management
- Design schema that is optimal for storage and query of data.
- Define public views for the end-user. Guards end-user from changes in underlying schema
- Run ANALYZE and VACUUM commands on a regular basis. Especially, after you insert, update, or delete a significant number of rows in a table
- Use distribution keys that evenly distributes data to multiple nodes yet keeping data that needs to be joined together on the same node.
- Sort key: use the attribute you will be filtering data upon
- Distribution key: use the attribute on which you will be joining data. Say we have two tables: A and B and say we will be joining rows from these tables on a column called COL1. It would be advisable to set COL1 as the distribution key for BOTH table A and B because that would keep the rows that need be joined on the same node.
- Use the sort keys when joining table, if possible.
- For lookup tables, DISTSTYLE ALL can be specified to make a copy of the table available in the first slice of each node (helps JOINs)
- Slices rather than nodes that determine performance
- Use EXPLAIN to understand the query plan. Query performance is logged in system tables.
- Do not use Select *. Instead, specify desired columns explicitly (Redshift is a columnar database).
- Instead of deleting or vacuuming an existing table, it might be quicker to create a new table with the desired data from the existing table.
- Workload Management - queries can be routed to different queues to meet expected run times
- Can write UDFs in Python
- ANALYZE compression weekly
- VACUUM weekly
- Don't compress sort keys
- Don't create columns wider than needed (VARCHAR), wastes memory
- Interleaved sort by each column individually, combine using a math function. Goes to a single block. Use INTERLEAVED keyword in CREATE. VACUUM REDINDEX to optimize index.
- Use caching layer for dashboard
- Use multiple queues for queries
- Redshift Console - can view query plan graphically
Connectivity
JDBC dbUrl
jdbc:redshift://<clusterurl>:5439/<schemaname>;AuthMech=PREFER
Properties props = new Properties();
props.setProperty("user", <username>);
props.setProperty("password", <pwd>);
Connection conn = DriverManager.getConnection(<dbUrl>, props);
Escape single quotes in SQL with \.
Example: \'2016-03-10\'
In Java String variable, the value should be \\'2016-03-10\\'
Escape single quotes in SQL with \.
Example: \'2016-03-10\'
In Java String variable, the value should be \\'2016-03-10\\'
ODBC
Driver={Amazon Redshift (x86)}; Server=<clusterurl>; Database=<schemaname>; UID=<username>; PWD=<password; Port=5439; sslmode=allow
Useful queries
http://github.com/awslabs/amazon-redshift-utils
Database Table attributes
-- note the compression encoding applied to each column
select "column", type, encoding, distkey, sortkey
select "column", type, encoding, distkey, sortkey
from pg_table_def where tablename = '<tablename>';
trim(pgdb.datname) as Database,
trim(pgn.nspname) as Schema,
trim(a.name) as Table,
b.mbytes,
a.rows
from (
select db_id, id, name, sum(rows) as rows
from stv_tbl_perm a
group by db_id, id, name
) as a
join pg_class as pgc on pgc.oid = a.id
join pg_namespace as pgn on pgn.oid = pgc.relnamespace
join pg_database as pgdb on pgdb.oid = a.db_id
join (
select tbl, count(*) as mbytes
from stv_blocklist
group by tbl
) b on a.id = b.tbl
order by mbytes desc, a.db_id, a.name;
select slice, col, num_values, minvalue, maxvalue
from svv_diskusage
where name='<tablename>' and col =0
order by slice, col;
Data Usage
selecttrim(pgdb.datname) as Database,
trim(pgn.nspname) as Schema,
trim(a.name) as Table,
b.mbytes,
a.rows
from (
select db_id, id, name, sum(rows) as rows
from stv_tbl_perm a
group by db_id, id, name
) as a
join pg_class as pgc on pgc.oid = a.id
join pg_namespace as pgn on pgn.oid = pgc.relnamespace
join pg_database as pgdb on pgdb.oid = a.db_id
join (
select tbl, count(*) as mbytes
from stv_blocklist
group by tbl
) b on a.id = b.tbl
order by mbytes desc, a.db_id, a.name;
Data Distribution
-- how the data for a particular table is distributed amongst slicesselect slice, col, num_values, minvalue, maxvalue
from svv_diskusage
where name='<tablename>' and col =0
order by slice, col;
Database object information
- pg_table_def - table and column definitions
- stv_tbl_perm - information about tables including temporary tables
- stv_locks - current table updates
Dependencies
SELECT * FROM find_depend;
Create temporary tables
-- temporary tables are session-specific (global temporary tables are not supported)CREATE TEMPORARY TABLE a_temp_table (like existing_table_name)
-- Create table from another table with data
select * into new_table from existing_table;
Users and Groups
SELECT groname,usename from pg_group,pg_user where usesysid = any(grolist);select * from pg_user;
SELECT * from pg_tables;
SELECT tableowner FROM pg_tables
GROUP BY tableowner
order BY tableowner;
SELECT * FROM pg_views;
SELECT VIEWOWNER FROM pg_views
GROUP BY viewowner
order BY viewowner;
Use ALTER command to change owner of a table or view (only admin can change owner).
Permissions
GRANT SELECT ON [view or table name] TO GROUP endusers;
GRANT ALL ON [view or table name] TO joe;
Queues and Queries
select * from wlm_queue_state_vw;create view WLM_QUERY_STATE_VW as
select query, (service_class-5) as queue, slot_count, trim(wlm_start_time) as start_time, trim(state) as state, trim(queue_time) as queue_time, trim(exec_time) as exec_time
from stv_wlm_query_state;
select * from wlm_query_state_vw;
Running Queries
SELECT pid, trim(user_name), starttime, substring(query,1,20)FROM stv_recents
WHERE status='Running';
// Queries in progress
SELECT slice, query, trim(label) querylabel, pid,
starttime, substring(text,1,20) querytext
FROM stv_inflight;
// Queries whose memory usage spilled to disk
SELECT query, step, rows, workmem, label, is_diskbased
FROM svl_query_summary
WHERE is_diskbased='t' order by workmem;
Query Performance
EXPLAIN SELECT u.gender, g.gamename, count(*) AS purchases
FROM user_profile AS u, games AS g, order_line AS ol
GROUP BY u.gender, g.gamename
ORDER BY purchases DESC;
S3 Copy Audit
- stl_s3client - time taken to load
- stl_load_errors - errors encountered during load
- stl_loaderror_detail - details of load errors
- STV_load_state - current state of copy command and completion percentage
- STL_tr_conflit - for troubleshooting locking issues
- STL_file_scan - which nodes were accessed during copy
- STL_load_commits - query, filename, number of rows, slice affected by the load
Maintenance
- svl_vacuum_percentage - how much space can be reclaimed. use VACUUM command (options: FULL, SORT ONLY, DELETE ONLY)
- stv_table_perm - row count
- SVV_diskusage - storage allocation allocation for tables and databases at block level
- select sum(used-tossed) as used, sum(capacity) as capacity from stv_partitions -- disk space used
Monitoring
- Select * from stl_ddltext where userid > 1 order by pid, starttime, sequence -- who ran a DDL query
- STL_wlm_error - errors
- STL_wlm_query - queries
- STV_classification_config - current configuration valuess
- STV_wlm_query_queue_state - current queue status
- STV_wlm_query_state - state of queries
- stv_recents - running queries
Recent Connections and Queries
select * from stl_connection_log where remotehost like '52%' order by recordtime desc;
select * from stl_query order by endtime desc limit 10;
select * from stl_load_errors order by starttime desc;
select * from stl_error where userid!=0 order by recordtime desc;