python - Why does scipy linear interpolation run faster than nearest neighbor interpolation? -
i've written routine interpolates point data onto regular grid. however, find scipy's implementation of nearest neighbor interpolation performs twice slow radial basis function i'm using linear interpolation (scipy.interpolate.rbf)
relevant code includes how interpolators constructed
if interpolation_mode == 'linear': interpolator = scipy.interpolate.rbf( point_array[:, 0], point_array[:, 1], value_array, function='linear', smooth=.01) elif interpolation_mode == 'nearest': interpolator = scipy.interpolate.nearestndinterpolator( point_array, value_array) and when interpolation called
result = interpolator(col_coords.ravel(), row_coords.ravel()) the sample i'm running on has 27 input interpolant value points , i'm interpolating across 20000 x 20000 grid. (i'm doing in memory block sizes i'm not exploding computer btw.)
below result of 2 cprofiles i've run on relevant code. note nearest neighbor scheme runs in 406 seconds while linear scheme runs in 256 seconds. nearest scheme dominated calls scipy's kdtree, seems reasonable, except rbf outperforms significant amount of time. ideas why or make nearest scheme run faster linear?
linear run:
25362 function calls in 225.886 seconds ordered by: internal time list reduced 328 10 due restriction <10> ncalls tottime percall cumtime percall filename:lineno(function) 253 169.302 0.669 207.516 0.820 c:\python27\lib\site-packages\scipy\interpolate\rbf.py:112( _euclidean_norm) 258 38.211 0.148 38.211 0.148 {method 'reduce' of 'numpy.ufunc' objects} 252 6.069 0.024 6.069 0.024 {numpy.core._dotblas.dot} 1 5.077 5.077 225.332 225.332 c:\python27\lib\site-packages\pygeoprocessing-0.3.0a8.post2 8+n5b1ee2de0d07-py2.7-win32.egg\pygeoprocessing\geoprocessing.py:333(interpolate_points_uri) 252 1.849 0.007 2.137 0.008 c:\python27\lib\site-packages\numpy\lib\function_base.py:32 85(meshgrid) 507 1.419 0.003 1.419 0.003 {method 'flatten' of 'numpy.ndarray' objects} 1268 1.368 0.001 1.368 0.001 {numpy.core.multiarray.array} 252 1.018 0.004 1.018 0.004 {_gdal_array.bandrasterionumpy} 1 0.533 0.533 225.886 225.886 pygeoprocessing\tests\helper_driver.py:10(interpolate) 252 0.336 0.001 216.716 0.860 c:\python27\lib\site-packages\scipy\interpolate\rbf.py:225( __call__) nearest neighbor run:
27539 function calls in 405.624 seconds ordered by: internal time list reduced 309 10 due restriction <10> ncalls tottime percall cumtime percall filename:lineno(function) 252 397.806 1.579 397.822 1.579 {method 'query' of 'ckdtree.ckdtree' objects} 252 1.875 0.007 1.881 0.007 {scipy.interpolate.interpnd._ndim_coords_from_arrays} 252 1.831 0.007 2.101 0.008 c:\python27\lib\site-packages\numpy\lib\function_base.py:3285(meshgrid) 252 1.034 0.004 400.739 1.590 c:\python27\lib\site-packages\scipy\interpolate\ndgriddata.py:60(__call__) 1 1.009 1.009 405.030 405.030 c:\python27\lib\site-packages\pygeoprocessing-0.3.0a8.post28+n5b1ee2de0d07-py2.7-win32.egg\pygeoprocessing\geoprocessing.py:333(interpolate_points_uri) 252 0.719 0.003 0.719 0.003 {_gdal_array.bandrasterionumpy} 1 0.509 0.509 405.624 405.624 pygeoprocessing\tests\helper_driver.py:10(interpolate) 252 0.261 0.001 0.261 0.001 {numpy.core.multiarray.copyto} 27 0.125 0.005 0.125 0.005 {_ogr.layer_createfeature} 1 0.116 0.116 0.254 0.254 c:\python27\lib\site-packages\pygeoprocessing-0.3.0a8.post28+n5b1ee2de0d07-py2.7-win32.egg\pygeoprocessing\geoprocessing.py:362(_parse_point_data) for reference, i'm including visual result of these 2 test cases.
nearest
linear
running example in griddata doc:
in [47]: def func(x, y): return x*(1-x)*np.cos(4*np.pi*x) * np.sin(4*np.pi*y**2)**2 ....: in [48]: points = np.random.rand(1000, 2) in [49]: values = func(points[:,0], points[:,1]) in [50]: grid_x, grid_y = np.mgrid[0:1:100j, 0:1:200j] so have 1000 scattered points, , interpolate @ 20,000.
in [52]: timeit interpolate.griddata(points, values, (grid_x, grid_y), method='nearest') 10 loops, best of 3: 83.6 ms per loop in [53]: timeit interpolate.griddata(points, values, (grid_x, grid_y), method='linear') 1 loops, best of 3: 24.6 ms per loop in [54]: timeit interpolate.griddata(points, values, (grid_x, grid_y), method='cubic') 10 loops, best of 3: 42.7 ms per loop and 2 stage interpolators:
in [55]: %%timeit rbfi = interpolate.rbf(points[:,0],points[:,1],values) dl = rbfi(grid_x.ravel(),grid_y.ravel()) ....: 1 loops, best of 3: 3.89 s per loop in [56]: %%timeit ndi=interpolate.nearestndinterpolator(points, values) dl=ndi(grid_x.ravel(),grid_y.ravel()) ....: 10 loops, best of 3: 82.6 ms per loop in [57]: %%timeit ldi=interpolate.linearndinterpolator(points, values) dl=ldi(grid_x.ravel(),grid_y.ravel()) .... 10 loops, best of 3: 25.1 ms per loop griddata 1 step cover call these last 2 versions.
griddata describes methods as:
nearest return value @ data point closest point of interpolation. see nearestndinterpolator more details. uses scipy.spatial.ckdtree linear tesselate input point set n-dimensional simplices, , interpolate linearly on each simplex. linearndinterpolator details are: interpolant constructed triangulating input data qhull [r37], , on each triangle performing linear barycentric interpolation. cubic (2-d) return value determined piecewise cubic, continuously differentiable (c1), , approximately curvature-minimizing polynomial surface. see cloughtocher2dinterpolator more details. further tests on 2 stage versions shows setting nearest ckttree fast; of time spent in 2nd interpolation state.
on other hand, setting triangulated surface takes longer linear interpolation.
i don't know enough of rbf method why slower. underlying methods different intuitions developed simple manual methods of interpolation don't mean much.
your example starts fewer scattered points, , interpolates on finer grid.


Comments
Post a Comment