c++ - Runtime cudaErrorInsufficientDriver error from cudaGetDeviceCount when compiling with nvcc, icpc -
problem
i have fft-based application uses fftw3. working on porting application cuda-based implementation using cufft. compiling , running fft core of application standalone within nsight works fine. have moved there integrating device code application.
when run using cufft core code integrated application, cudagetdevicecount
returns cudaerrorinsufficientdriver
error, although did not nsight standalone run. call made @ beginning of run when i'm initializing gpu.
background
i running on centos 6, using cuda 7.0 on geforce gtx 750, , icpc
12.1.5. have tested small example using gt 610. both cards work in nsight (and i've compiled , run command-line without problems, though not extensively within nsight).
to integrate cufft implementation of fft core application, compiled , device-linked nvcc
, used icpc
(the intel c++ compiler) compile host code , link device , host code create .so. completed step without errors or warnings (relying on this tutorial).
(the reasoning why i'm using .so has fair amount of history , additional background. suffice making .so required application.)
the tutorial points out compilation steps different between generating standalone executable (as in nsight) , generating device-linked library inclusion in .so. through compilation, had add -lcudart
described in tutorial, -lcuda
, icpc
linking call (as -l
add .../cuda-7.0/lib64
, .../cuda-7.0/lib64/stubs
paths libraries).
note: nvcc
links in libcudart
default. i'm assuming same libcuda
since nsight doesn't include either of these libraries in of compile , linking steps.. aside, find strange although nvcc
links them in default, don't show call ldd
on executable.
i had add --compiler-options '-fpic'
nvcc
commands avoid errors described here.
i have seen chatter (for 1 example, see this post) intel/nvcc compatibilities, looks arise @ compile-time older versions of nvcc, so...i think i'm ok on account.
finally, here compile commands compilation of 3 .cu files (all identical except name of .cu file , name of .o file):
nvcc -ccbin g++ -iinc -i/path/to/cuda/samples/common/inc -m64 -o3 -gencode arch=compute_20,code=sm_20 -gencode arch=compute_30,code=sm_30 -gencode arch=compute_35,code=sm_35 -gencode arch=compute_37,code=sm_37 -gencode arch=compute_50,code=sm_50 -gencode arch=compute_52,code=sm_52 -gencode arch=compute_52,code=compute_52 --relocatable-device-code=true --compile --compiler-options '-fpic' -o my_object_file1.o -c my_source_code_file1.cu
and here flags pass device linking step:
nvcc -ccbin g++ -iinc -i/path/to/cuda/samples/common/inc -m64 -o3 -gencode arch=compute_20,code=sm_20 -gencode arch=compute_30,code=sm_30 -gencode arch=compute_35,code=sm_35 -gencode arch=compute_37,code=sm_37 -gencode arch=compute_50,code=sm_50 -gencode arch=compute_52,code=sm_52 -gencode arch=compute_52,code=compute_52 --compiler-options '-fpic' --device-link my_object_file1.o my_object_file2.o my_object_file3.o -o my_device_linked_object_file.o
i don't need -gencode
flags 30, 37, , 52, @ least currently, shouldn't cause problems, , eventually, compile way.
and here compiling flags (minus -o flag, , -i flags) use .cc file uses calls cuda library:
-c -fpic -d_largefile_source -d_file_offset_bits=64 -fno-operator-names -d_reentrant -d_posix_pthread_semantics -dm2klite -dgcc_ -std=gnu++98 -o2 -fp-model source -gcc -wd1881 -vec-report0
finally, here linking flags:
-pthread -shared
any ideas on how fix problem?
don't add ld_library_path .../cuda7.0/lib64/stubs
. if do, pick libcuda.so there instead of driver. (see this post).
Comments
Post a Comment