SAS unexpected truncation when recoding string -
when try recode string string, not expected result:
data:
data test1; input c_tnm_t $ ; datalines; azcd11 azcd10 azcd12 azcd13 azcd131 azcd13a azcd13a1 azcd13a2 azcd13b azcd13b1 azcd13b2 azcd13c azcd14 ;
i'm trying recode azcd12 'is':
data test2; set test1; if c_tnm_t = 'azcd11' _33_ct_temp = 'a' ; if c_tnm_t = 'azcd10' _33_ct_temp = '0' ; if c_tnm_t = 'azcd12' _33_ct_temp = 'is' ; run;
but 'azcd12' instead recoded 'i' (as in picture below). why this?
if recoded 'azcd12' result expected:
data test2; set test1; if c_tnm_t = 'azcd12' _33_ct_temp = 'is' ; run;
ps. feel free edit title if have suggestion @ better description of problem.
this classic conundrum sas users face @ 1 point or another, , can bit confusing due how sas works compared other languages. when assign value new variable _33_ct_temp
, initializing @ same time. sas initialize variable length of first value assigned, , determine whether string or number based upon value being assigned it.
consider 3 variables initialized in program:
data test; numvar = 50; charvar1 = 'a'; charvar2 = 'ab'; run;
running proc contents
on dataset show:
variable type len charvar1 char 1 charvar2 char 2 numvar num 8
these default assignments sas give on initialization. notice numeric variable automatically assigned length of 8 bytes, while character variables different depending upon length. subsequent assignments > 1 character (or byte) in charvar1
result in truncating value 1 character. why seeing phenomenon in data.
in first test2
dataset, _33_ct_temp
first being assigned value of a
. above program, initializes length of 1, truncating next value is
i
. in second test2
dataset, _33_ct_temp
first being assigned value of is
, giving length of 2.
to around problem, want initialize character variable first maximum length think be. space isn't of concern anymore, can lot more liberal assignment. of course, scan column find maximum possible length, if have massive dataset , not whole lot of computing resources, isn't worth it.
you can set length length
statement either @ beginning of program, or before assign first value of variable:
data test2; set test1; length _33_ct_temp $2.; if(c_tnm_t = 'azcd11') _33_ct_temp = 'a'; if(c_tnm_t = 'azcd10') _33_ct_temp = '0'; if(c_tnm_t = 'azcd12') _33_ct_temp = 'is; run;
you can use length
statement way set column order of variables. columns set in order variables initialized. if changed above program to:
data test2; length _33_ct_temp $2.; set test1; if(c_tnm_t = 'azcd11') _33_ct_temp = 'a'; if(c_tnm_t = 'azcd10') _33_ct_temp = '0'; if(c_tnm_t = 'azcd12') _33_ct_temp = 'is; run;
you find _33_ct_temp
first. use trick lot, particularly large datasets containing lots of id variables or dates. example:
data a; length date hour minute second cust_id trans_id 8. first_name last_name $30.; set have; <code> run;
Comments
Post a Comment