1-of-k (one-hot) encoding in Theano -
i'm doing this numpy. seq
list indices. i.e. implements 1-of-k encoding (also called one-hot).
def 1_of_k(seq, num_classes): num_frames = len(seq) m = np.zeros((num_frames, num_classes)) m[np.arange(num_frames), seq] = 1 return m
how same thing in theano? (most efficient solution efficient cuda.)
there built in function (theano.tensor.extra_ops.to_one_hot
) still slower doing in numpy. if feasible task, might better off computing outside theano , passing dense result in input instead of passing indices.
here's code illustrating 3 numpy methods, , 4 theano methods. code includes answers provided albert (numpy_1_of_k_3
/compile_theano_1_of_k_3
) , eickenberg (numpy_1_of_k_2
/compile_theano_1_of_k_4
) comparison.
it turns out the built in theano method (compile_theano_1_of_k_2
) uses same code own attempt (numpy_1_of_k_1
/compile_theano_1_of_k_1
).
import timeit import numpy np import theano import theano.tensor tt import theano.tensor.extra_ops def numpy_1_of_k_1(seq, num_classes): num_frames = len(seq) m = np.zeros((num_frames, num_classes)) m[np.arange(num_frames), seq] = 1 return m def numpy_1_of_k_2(seq, num_classes): return seq[:, np.newaxis] == np.arange(num_classes) def numpy_1_of_k_3(seq, num_classes): shape = [seq.shape[i] in range(seq.ndim)] + [num_classes] eye = np.eye(num_classes) return eye[seq].reshape(shape) def compile_theano_1_of_k_1(): seq = tt.lvector() num_classes = tt.lscalar() num_frames = seq.shape[0] m = tt.zeros((num_frames, num_classes)) m = tt.set_subtensor(m[tt.arange(num_frames), seq], 1) return theano.function([seq, num_classes], outputs=m) def compile_theano_1_of_k_2(): seq = tt.lvector() num_classes = tt.lscalar() return theano.function([seq, num_classes], outputs=theano.tensor.extra_ops.to_one_hot(seq, num_classes)) def compile_theano_1_of_k_3(): seq = tt.lvector() num_classes = tt.lscalar() shape = [seq.shape[i] in range(seq.ndim)] + [num_classes] eye = tt.eye(num_classes) m = eye[seq].reshape(shape) return theano.function([seq, num_classes], outputs=m) def compile_theano_1_of_k_4(): seq = tt.lvector() num_classes = tt.lscalar() one_hot = tt.eq(seq.reshape((-1, 1)), tt.arange(num_classes)) return theano.function([seq, num_classes], outputs=one_hot) def main(iterations): theano_1_of_k_1 = compile_theano_1_of_k_1() theano_1_of_k_2 = compile_theano_1_of_k_2() theano_1_of_k_3 = compile_theano_1_of_k_3() theano_1_of_k_4 = compile_theano_1_of_k_4() test_seq = np.array([0, 1, 2, 0, 1, 2]) test_num_classes = 4 test_functions = [numpy_1_of_k_1, numpy_1_of_k_2, numpy_1_of_k_3, theano_1_of_k_1, theano_1_of_k_2, theano_1_of_k_3, theano_1_of_k_4] test_results = [test_function(test_seq, test_num_classes) test_function in test_functions] a, b in zip(test_results[:-1], test_results[1:]): assert np.all(np.equal(a, b)), (a, b) data = [] _ in xrange(iterations): num_classes = np.random.randint(100) + 1 seq = np.random.randint(num_classes, size=(np.random.randint(100) + 1)) data.append((seq, num_classes)) test_function in test_functions: start = timeit.default_timer() total = 0 seq, num_classes in data: total += test_function(seq, num_classes).sum() print timeit.default_timer() - start, total main(100000)
using laptop , running theano code on cpu, following timings in seconds:
numpy_1_of_k_1 1.0645 numpy_1_of_k_2 1.4018 numpy_1_of_k_3 1.6131 theano_1_of_k_1 6.3542 theano_1_of_k_2 6.4628 theano_1_of_k_3 6.5637 theano_1_of_k_4 5.4588
so in numpy, identity approach slower simple broadcast slower set zeros. in theano relative performance order differs; here simple broadcast approach fastest.
these quite small test cases relative performances may differ larger matrices, or when running on gpu.
Comments
Post a Comment