cql3 - What is the best way to perform a subtraction on Cassandra by cqlsh request? -


i trying make cqlsh request on cassandra. idea analyse average length different 'gene' registers had been annotated in genome. after enter data, have athaliana.tab table:

 id                                   | chr | comments          | end      | orf | sense | start    | type --------------------------------------+-----+-------------------+----------+-----+-------+----------+------  d2ab2520-6734-11e5-955c-234085c1edec |   1 | gene_id at1g16340 |  5590338 |   0 |     - |  5590241 |  cds  d4169c00-6734-11e5-955c-234085c1edec |   1 | gene_id at1g16610 |  5676495 |   . |     - |  5676429 | exon  a8c792c0-6734-11e5-955c-234085c1edec |   1 | gene_id at1g07485 |  2301889 |   0 |     + |  2301665 |  cds  3bd5c0a0-6735-11e5-955c-234085c1edec |   1 | gene_id at1g51980 | 19326916 |   . |     - | 19326733 | exon  263b5b60-6735-11e5-955c-234085c1edec |   1 | gene_id at1g44990 | 17007808 |   . |     - | 17007542 | gene  67989a50-6735-11e5-955c-234085c1edec |   1 | gene_id at1g63110 | 23405144 |   . |     + | 23404821 |  utr  26f7f4a0-6735-11e5-955c-234085c1edec |   1 | gene_id at1g45180 | 17101207 |   0 |     + | 17101109 |  cds  3743dc70-6735-11e5-955c-234085c1edec |   1 | gene_id at1g50840 | 18841644 |   0 |     + | 18840965 |  cds  e5099940-6734-11e5-955c-234085c1edec |   1 | gene_id at1g20620 |  7145780 |   . |     + |  7145691 | exon  2ba30620-6735-11e5-955c-234085c1edec |   1 | gene_id at1g48180 | 17793717 |   . |     - | 17792449 | gene 

the idea obtain subtraction between end-start columns data , calculate average. have tried way:

select avg(end-start) athaliana.tab chr = '1' , type = 'gene'; 

but results in:

syntaxexception: <errormessage code=2000 [syntax error in cql query] message="line 1:14 no viable alternative @ input '-' (select avg([end]-...)"> 

some ideas? appreciated.

cassandra designed maximum read , write efficiency , has no built-in support functions applying multiple columns such plus or minus nor aggregates, i.e., functions applying multiple rows, such avg or count.

until recently, workaround store results of functions in separate column , perform aggregations in own application code.

however, cassandra 2.2 , 3.0 offer user-defined functions (udfs) , user-defined aggregates (udas). can define own functions, have cassandra invoke them you. see cql spec chapters on udfs , udas or this blog post.

first, create following functions , aggregates:

use athaliana;  create function minus (x int, y int) returns null on null input returns int language java 'return x-y;';  create function avgstate (state tuple<int,bigint>, val int) called on null input returns tuple<int,bigint>  language java '     if (val != null) {       state.setint(0, state.getint(0)+1);       state.setlong(1, state.getlong(1)+val.intvalue());     }     return state; ';  create function avgfinal (state tuple<int,bigint>) called on null input returns double language java '     double r = 0;     if (state.getint(0) == 0) return null;     r = state.getlong(1);     r /= state.getint(0);     return double.valueof(r);   ';  create aggregate avg(int) sfunc avgstate stype tuple<int,bigint> finalfunc avgfinal initcond (0, 0); 

then query rewritten way:

select avg(minus(end,start)) athaliana.tab chr = '1' , type = 'gene'; 

note should use aggregates on single partition key; if not case, warning raised.


Comments

Popular posts from this blog

html - Outlook 2010 Anchor (url/address/link) -

javascript - Why does running this loop 9 times take 100x longer than running it 8 times? -

Getting gateway time-out Rails app with Nginx + Puma running on Digital Ocean -