MS SQL Server - how to ignore/ discard very rare duplicate rows during table insert -
i discovered error in client api. sends pages of unique records based on timestamps. every often, send same exact record on end of previous page, , start of next page.
i operating under assumption there no duplicates (this event log after all). therefore doing pure inserts, instead of going through trouble of update or sort of 'on duplicate' step.
my question --- what's best way rid of these duplicates, , have ms sql quietly discard these duplicate records upon insert? inserted after original has been written in transaction. these duplicates happen once every 10,000 rows -- memory light great.
the distinct part of records "etl batch id" - else identical. otherwise each record supposed have unique "event id". if make event id unique index/ key, can somehow force ms sql dump duplicate keys? or application throw error message , stop during query?
there infamous ignore_dup_key
index option answers question. if try avoid using because changes insert semantics in subtle , hard discover way.
it efficient, though: http://blogs.msdn.com/b/craigfr/archive/2008/01/30/maintaining-unique-indexes-with-ignore-dup-key.aspx , http://sqlblog.com/blogs/paul_white/archive/2013/02/01/a-creative-use-of-ignore-dup-key.aspx.
if make event id unique index/ key, can somehow force ms sql dump duplicate keys? or application throw error message , stop during query?
yes, insert fail. preferred way in situations. if can't make work ignore_dup_key
might next best thing.
Comments
Post a Comment