cuda - is shared memory persistent from one kernel launch to another? -


when trying find whether shared memory can accessed multiple kernels, have found data in shared memory still there when accessing kernel, not. what's more, when debugging program cuda-gdb, data written in shared memory previous kernel can read next kernels.

the following piece of test code, 2gpus.

    extern __shared__ double f_ds[];     __global__ void kernel_writetosharedmem(double* f_dev, int spd_x)     {        int tid_dev_x = (blockdim.x * blockidx.x + threadidx.x);        int tid_dev_y = (blockdim.y * blockidx.y + threadidx.y);        int tid_dev = tid_dev_y* spd_x + tid_dev_x;         if(tid_dev < blockdim.x * blockdim.y * griddim.x*griddim.y)           f_ds[threadidx.y*blockdim.x+threadidx.x] = 0.12345;        __syncthreads()     }    __global__ void kernel_readfromsharedmem(double *f_dev, int dev_no, int spd_x)     {        int tid_dev_x = (blockdim.x * blockidx.x + threadidx.x);        int tid_dev_y = (blockdim.y * blockidx.y + threadidx.y);        int tid_dev = tid_dev_y* spd_x + tid_dev_x;         if(tid_dev < blockdim.x * blockdim.y * griddim.x*griddim.y)          {            f_dev[tid_dev] = f_ds[threadidx.y*blockdim.x+threadidx.x];            printf("threadid %d in dev [%d] having number %f\n",                    tid_dev,dev_no,f_ds[threadidx.y*blockdim.x+threadidx.x]);          }        __syncthreads();      }       int main()     {      ...         dim3 block_size(block_size,block_size);        im3 grid_size(spd_x/block_size,spd_y/block_size);        for(int = 0; < ngpus; i++)          {            cudasetdevice(i);            kernel_writetosharedmem<<<grid_size,block_size,sizeof(double)*block_size*block_size,stream[i]>>>(f_dev[i],spd_x);            cudadevicesynchronize();            cudathreadsynchronize();           }         for(int = 0; < ngpus; i++)          {            cudasetdevice(i);            kernel_reafromsharedmem<<<grid_size,block_size,sizeof(double)*block_size*block_size,stream[i]>>>(f_dev[i], int i, spd_x);            cudadevicesynchronize();            cudathreadsynchronize();           }       ...     } 

4 situation occurred after running program.

1)dev0 0.12345 dev1 0;

2) dev0 0 dev1 0.12345;

3) dev0 , dev1 0;

4) dev0 , dev1 0.12345.

when running in cuda-gdb 4) case.

does indicate shared memory's persistent 1 kernel? shared memory cleared or freed after 1 kernel occasionally?

shared memory guaranteed have scope life of block assigned. attempt re-use shared memory block block or kernel launch kernel launch undefined behaviour , should never relied in sane code design.


Comments

Popular posts from this blog

1111. appearing after print sequence - php -

java - WARN : org.springframework.web.servlet.PageNotFound - No mapping found for HTTP request with URI [/board/] in DispatcherServlet with name 'appServlet' -

Ruby on Rails, ActiveRecord, Postgres, UTF-8 and ASCII-8BIT encodings -