cql3 - What is the best way to perform a subtraction on Cassandra by cqlsh request? -
i trying make cqlsh request on cassandra. idea analyse average length different 'gene' registers had been annotated in genome. after enter data, have athaliana.tab table:
id | chr | comments | end | orf | sense | start | type --------------------------------------+-----+-------------------+----------+-----+-------+----------+------ d2ab2520-6734-11e5-955c-234085c1edec | 1 | gene_id at1g16340 | 5590338 | 0 | - | 5590241 | cds d4169c00-6734-11e5-955c-234085c1edec | 1 | gene_id at1g16610 | 5676495 | . | - | 5676429 | exon a8c792c0-6734-11e5-955c-234085c1edec | 1 | gene_id at1g07485 | 2301889 | 0 | + | 2301665 | cds 3bd5c0a0-6735-11e5-955c-234085c1edec | 1 | gene_id at1g51980 | 19326916 | . | - | 19326733 | exon 263b5b60-6735-11e5-955c-234085c1edec | 1 | gene_id at1g44990 | 17007808 | . | - | 17007542 | gene 67989a50-6735-11e5-955c-234085c1edec | 1 | gene_id at1g63110 | 23405144 | . | + | 23404821 | utr 26f7f4a0-6735-11e5-955c-234085c1edec | 1 | gene_id at1g45180 | 17101207 | 0 | + | 17101109 | cds 3743dc70-6735-11e5-955c-234085c1edec | 1 | gene_id at1g50840 | 18841644 | 0 | + | 18840965 | cds e5099940-6734-11e5-955c-234085c1edec | 1 | gene_id at1g20620 | 7145780 | . | + | 7145691 | exon 2ba30620-6735-11e5-955c-234085c1edec | 1 | gene_id at1g48180 | 17793717 | . | - | 17792449 | gene
the idea obtain subtraction between end-start columns data , calculate average. have tried way:
select avg(end-start) athaliana.tab chr = '1' , type = 'gene';
but results in:
syntaxexception: <errormessage code=2000 [syntax error in cql query] message="line 1:14 no viable alternative @ input '-' (select avg([end]-...)">
some ideas? appreciated.
cassandra designed maximum read , write efficiency , has no built-in support functions applying multiple columns such plus
or minus
nor aggregates, i.e., functions applying multiple rows, such avg
or count
.
until recently, workaround store results of functions in separate column , perform aggregations in own application code.
however, cassandra 2.2 , 3.0 offer user-defined functions (udfs) , user-defined aggregates (udas). can define own functions, have cassandra invoke them you. see cql spec chapters on udfs , udas or this blog post.
first, create following functions , aggregates:
use athaliana; create function minus (x int, y int) returns null on null input returns int language java 'return x-y;'; create function avgstate (state tuple<int,bigint>, val int) called on null input returns tuple<int,bigint> language java ' if (val != null) { state.setint(0, state.getint(0)+1); state.setlong(1, state.getlong(1)+val.intvalue()); } return state; '; create function avgfinal (state tuple<int,bigint>) called on null input returns double language java ' double r = 0; if (state.getint(0) == 0) return null; r = state.getlong(1); r /= state.getint(0); return double.valueof(r); '; create aggregate avg(int) sfunc avgstate stype tuple<int,bigint> finalfunc avgfinal initcond (0, 0);
then query rewritten way:
select avg(minus(end,start)) athaliana.tab chr = '1' , type = 'gene';
note should use aggregates on single partition key; if not case, warning raised.
Comments
Post a Comment