SAS unexpected truncation when recoding string -


when try recode string string, not expected result:

data:

data test1;    input c_tnm_t $ ;    datalines; azcd11   azcd10   azcd12 azcd13 azcd131 azcd13a azcd13a1 azcd13a2 azcd13b azcd13b1 azcd13b2 azcd13c azcd14 ; 

i'm trying recode azcd12 'is':

data test2; set test1;  if c_tnm_t = 'azcd11'   _33_ct_temp   =  'a'   ;  if c_tnm_t = 'azcd10'   _33_ct_temp   =  '0'   ;  if c_tnm_t = 'azcd12'   _33_ct_temp   =  'is'  ; run; 

but 'azcd12' instead recoded 'i' (as in picture below). why this?

enter image description here

if recoded 'azcd12' result expected:

data test2; set test1;   if c_tnm_t = 'azcd12'   _33_ct_temp   =  'is'  ; run; 

enter image description here

ps. feel free edit title if have suggestion @ better description of problem.

this classic conundrum sas users face @ 1 point or another, , can bit confusing due how sas works compared other languages. when assign value new variable _33_ct_temp, initializing @ same time. sas initialize variable length of first value assigned, , determine whether string or number based upon value being assigned it.

consider 3 variables initialized in program:

data test;     numvar = 50;     charvar1 = 'a';     charvar2 = 'ab'; run; 

running proc contents on dataset show:

variable    type    len charvar1    char    1 charvar2    char    2 numvar      num     8 

these default assignments sas give on initialization. notice numeric variable automatically assigned length of 8 bytes, while character variables different depending upon length. subsequent assignments > 1 character (or byte) in charvar1 result in truncating value 1 character. why seeing phenomenon in data.

in first test2 dataset, _33_ct_temp first being assigned value of a. above program, initializes length of 1, truncating next value is i. in second test2 dataset, _33_ct_temp first being assigned value of is, giving length of 2.

to around problem, want initialize character variable first maximum length think be. space isn't of concern anymore, can lot more liberal assignment. of course, scan column find maximum possible length, if have massive dataset , not whole lot of computing resources, isn't worth it.

you can set length length statement either @ beginning of program, or before assign first value of variable:

data test2;     set test1;     length _33_ct_temp $2.;      if(c_tnm_t = 'azcd11') _33_ct_temp = 'a';     if(c_tnm_t = 'azcd10') _33_ct_temp = '0';     if(c_tnm_t = 'azcd12') _33_ct_temp = 'is; run; 

you can use length statement way set column order of variables. columns set in order variables initialized. if changed above program to:

data test2;     length _33_ct_temp $2.;     set test1;      if(c_tnm_t = 'azcd11') _33_ct_temp = 'a';     if(c_tnm_t = 'azcd10') _33_ct_temp = '0';     if(c_tnm_t = 'azcd12') _33_ct_temp = 'is; run; 

you find _33_ct_temp first. use trick lot, particularly large datasets containing lots of id variables or dates. example:

data a;     length date hour minute second cust_id trans_id 8.             first_name last_name $30.;     set have;     <code> run; 

Comments

Popular posts from this blog

html - Outlook 2010 Anchor (url/address/link) -

javascript - Why does running this loop 9 times take 100x longer than running it 8 times? -

Getting gateway time-out Rails app with Nginx + Puma running on Digital Ocean -