c++ - Runtime cudaErrorInsufficientDriver error from cudaGetDeviceCount when compiling with nvcc, icpc -


problem

i have fft-based application uses fftw3. working on porting application cuda-based implementation using cufft. compiling , running fft core of application standalone within nsight works fine. have moved there integrating device code application.

when run using cufft core code integrated application, cudagetdevicecount returns cudaerrorinsufficientdriver error, although did not nsight standalone run. call made @ beginning of run when i'm initializing gpu.

background

i running on centos 6, using cuda 7.0 on geforce gtx 750, , icpc 12.1.5. have tested small example using gt 610. both cards work in nsight (and i've compiled , run command-line without problems, though not extensively within nsight).

to integrate cufft implementation of fft core application, compiled , device-linked nvcc , used icpc (the intel c++ compiler) compile host code , link device , host code create .so. completed step without errors or warnings (relying on this tutorial).

(the reasoning why i'm using .so has fair amount of history , additional background. suffice making .so required application.)

the tutorial points out compilation steps different between generating standalone executable (as in nsight) , generating device-linked library inclusion in .so. through compilation, had add -lcudart described in tutorial, -lcuda, icpc linking call (as -l add .../cuda-7.0/lib64 , .../cuda-7.0/lib64/stubs paths libraries).

note: nvcc links in libcudart default. i'm assuming same libcuda since nsight doesn't include either of these libraries in of compile , linking steps.. aside, find strange although nvcc links them in default, don't show call ldd on executable.

i had add --compiler-options '-fpic' nvcc commands avoid errors described here.

i have seen chatter (for 1 example, see this post) intel/nvcc compatibilities, looks arise @ compile-time older versions of nvcc, so...i think i'm ok on account.

finally, here compile commands compilation of 3 .cu files (all identical except name of .cu file , name of .o file):

nvcc -ccbin g++ -iinc -i/path/to/cuda/samples/common/inc -m64 -o3 -gencode arch=compute_20,code=sm_20 -gencode arch=compute_30,code=sm_30 -gencode arch=compute_35,code=sm_35 -gencode arch=compute_37,code=sm_37 -gencode arch=compute_50,code=sm_50 -gencode arch=compute_52,code=sm_52 -gencode arch=compute_52,code=compute_52 --relocatable-device-code=true --compile --compiler-options '-fpic' -o my_object_file1.o -c my_source_code_file1.cu 

and here flags pass device linking step:

nvcc -ccbin g++ -iinc -i/path/to/cuda/samples/common/inc -m64 -o3 -gencode arch=compute_20,code=sm_20 -gencode arch=compute_30,code=sm_30 -gencode arch=compute_35,code=sm_35 -gencode arch=compute_37,code=sm_37 -gencode arch=compute_50,code=sm_50 -gencode arch=compute_52,code=sm_52 -gencode arch=compute_52,code=compute_52 --compiler-options '-fpic' --device-link my_object_file1.o my_object_file2.o my_object_file3.o -o my_device_linked_object_file.o 

i don't need -gencode flags 30, 37, , 52, @ least currently, shouldn't cause problems, , eventually, compile way.

and here compiling flags (minus -o flag, , -i flags) use .cc file uses calls cuda library:

-c -fpic -d_largefile_source -d_file_offset_bits=64 -fno-operator-names -d_reentrant -d_posix_pthread_semantics -dm2klite -dgcc_ -std=gnu++98 -o2 -fp-model source -gcc -wd1881 -vec-report0 

finally, here linking flags:

-pthread -shared 

any ideas on how fix problem?

don't add ld_library_path .../cuda7.0/lib64/stubs. if do, pick libcuda.so there instead of driver. (see this post).


Comments

Popular posts from this blog

html - Outlook 2010 Anchor (url/address/link) -

javascript - Why does running this loop 9 times take 100x longer than running it 8 times? -

Getting gateway time-out Rails app with Nginx + Puma running on Digital Ocean -