sql - redshift select distinct returns repeated values -

March 15, 2013

i have database each object property stored in separate row. attached query not return distinct values in redshift database works expected when testing in mysql compatible database.

select distinct distinct_value  (    select     uri,     ( select distinct value_string        `test_organization__app__testsegment` x        x.uri = parent.uri , name = 'hasteststring' , parent.value_string not null ) distinct_value    `test_organization__app__testsegment` parent             uri in ( select uri               `test_organization__app__testsegment`               name = 'types' , value_uri_multivalue = 'document'            ) ) t  distinct_value not null order distinct_value asc limit 10000 offset 0

this not bug , behavior intentional, though not straightforward.

in redshift, can declare constraints on tables redshift doesn't enforce them, i.e. allows duplicate values if insert them. difference here when run select distinct query against column doesn't have primary key declared scan whole column , unique values, , if run same on column has primary key constraint return output without scanning. how can duplicate entries if insert them.

why done? redshift optimized large datasets , it's faster copy data if don't need check constraint validity every row copy or insert. if want can declare primary key constraint part of data model need explicitly support removing duplicates or designing etl in way there no such.

more information specific examples in heap blog post redshift pitfalls , how avoid them

Search This Blog

TSQL

sql - redshift select distinct returns repeated values -

Comments

Post a Comment

Popular posts from this blog

1111. appearing after print sequence - php -

java - WARN : org.springframework.web.servlet.PageNotFound - No mapping found for HTTP request with URI [/board/] in DispatcherServlet with name 'appServlet' -

node.js - Express and Redis - If session exists for this user, don't allow access -