View on GitHub


Fast k nearest neighbor search using GPU

Download this project as a .zip file Download this project as a tar.gz file


The k-nearest neighbor (kNN) search is a problem found in many research and industrial domains such as 3-dimensional object rendering, content-based image retrieval, statistics (estimation of entropies and divergences), biology (gene classification), etc. In spite of some improvements in the last decades, the computation time required by the kNN search still remains the bottleneck of methods based on kNN, especially in high dimensionnal spaces.

We propose in this website two GPGPU implementations of the brute-force (exhaustive) kNN search algorithm. These implementations, through the APIs NVIDIA CUDA and NVIDIA CUBLAS, paralellize the kNN search process into the computer graphic card. Thanks to the parallel structure of the GPU and the optimization of the APIs, the proposed implementation appears to be much faster, especially in high dimension and for large dataset, than the celebrated and highly optimized ANN C++ library.


Source code information

The provided CUDA code is usable for C, C++ and Matlab. A preprocessor constant in the file's header allows to activate or deactivate the Matlab wrapper. Please read the README file to learn how to adapt the code. We provide 2 implementations of the kNN search to fit to your needs:

Most applications need to identify the kNN points by their index. Some other applications however are only interested in the distances to the k-th nearest neighbor (e.g. entropy estimation). We thus provide 2 specific implementations:

In total, we provide 4 specific implementations for solving the kNN search problem.

Important note

The provided code was written in the context of a Ph.D. thesis. The code was intended to provide a fast and generic solution to the challenging kNN search problem. However, authors are well aware that for a very specific application (e.g. small number of points, small dimension, only one query point, etc.) there is often a better implementation either in terms of computation time or memory footprint. It is known that a GPGPU implementation should be adapted to the problem and to the specifications of the GPU used. This code will help one to identify if the GPGPU solution is viable for the given problem. Following the terms of the license below, one is allowed and should adapt the code.


Following the terms of the license below, you need to acknowledge the use of this code. Do so by referencing the following articles.


The provided code is licensed under a Creative Commons Attribution-Share Alike 3.0 Unported License. You are free:

Under the following conditions: