FLANN快速近似最邻近算法官方指导文档_flann快速最近邻搜索库的手册-程序员宅基地

技术标签: 算法  experience  

by Oxidane Lin

本文翻译自FLANN的官方文档,原文是LATEX,改成了md格式方便阅读。能力时间所限,只翻译了C++部分,C/MATLAB/PYTHON还请自行对应。

Introduction

We can define the nearest neighbor search (NNS) problem in the
following way: given a set of points P = p 1 , p 2 , … , p n P=p_1,p_2,\dots,p_n P=p1,p2,,pn in a metric
space X X X, these points must be preprocessed in such a way that given a
new query point q ∈ X q \in X qX, finding the point in P P P that is nearest to
q q q can be done quickly.

定义“最邻近搜索”问题如下:在某个空间 X X X中任意给定一组点 P = p 1 , p 2 , … , p n P=p_1,p_2,\dots,p_n P=p1,p2,,pn,这些点经过预处理后,需要达到以下效果:任意给定一个新的查询点 q ∈ X q \in X qX,可以快速找到距离 q q q最近的点 P P P

The problem of nearest neighbor search is one of major importance in a
variety of applications such as image recognition, data compression,
pattern recognition and classification, machine learning, document
retrieval systems, statistics and data analysis. However, solving this
problem in high dimensional spaces seems to be a very difficult task and
there is no algorithm that performs significantly better than the
standard brute-force search. This has lead to an increasing interest in
a class of algorithms that perform approximate nearest neighbor
searches, which have proven to be a good-enough approximation in most
practical applications and in most cases, orders of magnitude faster
that the algorithms performing the exact searches.

在图像识别、数据压缩、模式识别和分类、机器学习、文档检索系统、统计和数据分析等各种应用中,最近邻搜索问题是一个非常重要的问题。然而,在高维空间中解决这个问题似乎是一项非常困难的任务,而且没有任何算法能比标准的蛮力搜索表现得更好。这使得人们对使用近似最近邻搜索的一类算法越来越感兴趣,在大多数实际应用中,这已经证明是一个足够好的近似,在大多数情况下,比执行精确搜索的算法快一个数量级。

FLANN (Fast Library for Approximate Nearest Neighbors) is a library for
performing fast approximate nearest neighbor searches. FLANN is written
in the C++ programming language. FLANN can be easily used in many
contexts through the C, MATLAB, Python, and Ruby bindings provided with
the library.

FLANN就是用C++写的一个快速近似最邻近算法。

Quick Start 快速学习

This section contains small examples of how to use the FLANN library
from different programming languages (C++, C, MATLAB, Python, and Ruby).

  • C++
        // file flann_example.cpp
        // 最简单的例子
        #include <flann/flann.hpp>
        #include <flann/io/hdf5.h>
        #include <stdio.h>
        int main(int argc, char** argv)
        {
    
            int nn = 3;
            flann::Matrix<float> dataset;
            flann::Matrix<float> query;
            flann::load_from_file(dataset, "dataset.hdf5","dataset");
            flann::load_from_file(query, "dataset.hdf5","query");
            flann::Matrix<int> indices(new int[query.rows*nn], query.rows, nn);
            flann::Matrix<float> dists(new float[query.rows*nn], query.rows, nn);
            // construct an randomized kd-tree index using 4 kd-trees
            flann::Index<flann::L2<float> > index(dataset, flann::KDTreeIndexParams(4));
            index.buildIndex();                                                                                               
            // do a knn search, using 128 checks
            index.knnSearch(query, indices, dists, nn, flann::SearchParams(128));
            flann::save_to_file(indices,"result.hdf5","result");
            delete[] dataset.ptr();
            delete[] query.ptr();
            delete[] indices.ptr();
            delete[] dists.ptr();
            
            return 0;
        }

Downloading and compiling FLANN 下载和编译

FLANN can be downloaded from the following address:

http://www.cs.ubc.ca/\simmariusm/flann

After downloading and unpacking, the following files and directories
should be present:

  • bin: directory various for scripts and binary files

  • doc: directory containg this documentation

  • examples: directory containg examples of using FLANN

  • src: directory containg the source files

  • test: directory containg unit tests for FLANN

To compile the FLANN library the CMake[^1] build system is required.
Below is an example of how FLANN can be compiled on Linux (replace x.y.z
with the corresponding version number).

$ cd flann-x.y.z-src
$ mkdir build
$ cd build
$ cmake ..
$ make

On windows the steps are very similar:

> "C:\Program Files\Microsoft Visual Studio 9.0\VC\vcvarsall.bat"
> cd flann-x.y.z-src
> mkdir build
> cd build
> cmake ..
> nmake

There are several compile options that can be configured before FLANN is
compiled, for example the build type (Release, RelWithDebInfo, Debug) or
whether to compile the C, Python or the MATLAB bindings. To change any
of this options use the cmake-gui application after cmake has
finished (see figure [fig:cmake-gui]).

> cmake-gui .

Configuring the FLANN compile options

Upgrading from a previous version 版本升级

This section contains changes that you need to be aware of when
upgrading from a previous version of FLANN.

Version 1.8.0

Due to changes in the library, the on-disk format of the saved
indexes has changed and it is not possible to load indexes
previously saved with an older version of the library.

Version 1.7.0

A small breaking API change in flann::Matrix requires client code
to be updated. In order to release the memory used by a matrix, use:

  delete[] matrix.ptr();

instead of:

  delete[] matrix.data;

or:

  matrix.free();

The member data of flann::Matrix is not publicly accessible any
more, use the ptr() method to obtain the pointer to the memory
buffer.

Compiling FLANN with multithreading support 多线程支持

For taking advantage of multithreaded search, the project that uses FLANN needs to be compiled with a compiler that supports the OpenMP standard and the OpenMP support must be enabled. The number of cores to be used can be selected with the cores field in the SearchParams structure. By default a single core will be used. Setting the cores field to zero will automatically use as many threads as cores available on the machine.

SearchParams里把cores设为零可以自动用最多的核实现快速多线程搜索。

Using FLANN 使用细则

Using FLANN from C++

The core of the FLANN library is written in C++. To make use of the full power and flexibility of the templated code one should use the C++ bindings if possible. To use the C++ bindings you only need to include the the library header file flann.hpp. An example of the compile command that must be used will look something like this:

g++ flann_example.cpp -I $FLANN_ROOT/include -o flann_example_cpp

where $FLANN_ROOT is the library main directory.

The following sections describe the public C++ API.

用C++最能体现模板代码的优势所在,最好用C++进行编程。编译过程如上↑。

flann::Index

The FLANN nearest neighbor index class. This class is used to abstract different types of nearest neighbor search indexes. The class is templated on the distance functor to be used for computing distances between pairs of features.

Index类。是各种最临近搜索索引的抽象。该类用到了距离因子的模版,以计算特征之间的距离。

namespace flann
{
    
    template<typename Distance>
    class Index 
    {
    
    typedef typename Distance::ElementType ElementType;
    typedef typename Distance::ResultType DistanceType;
    public:
        Index(const IndexParams& params, Distance distance = Distance() );
        
        Index(const Matrix<ElementType>& points, const IndexParams& params,
                Distance distance = Distance() );
    ~Index();
    void buildIndex();        
        
        void buildIndex(const Matrix<ElementType>& points);
        
        void addPoints(const Matrix<ElementType>& points, 
                        float rebuild_threshold = 2);
        
        void removePoint(size_t point_id);
        
        ElementType* getPoint(size_t point_id);
    int knnSearch(const Matrix<ElementType>& queries, 
                Matrix<int>& indices, 
                Matrix<DistanceType>& dists, 
                size_t knn, 
                const SearchParams& params);
        int knnSearch(const Matrix<ElementType>& queries,
                        std::vector< std::vector<int> >& indices,
                        std::vector<std::vector<DistanceType> >& dists,
                        size_t knn,
                        const SearchParams& params);
    int radiusSearch(const Matrix<ElementType>& queries, 
                Matrix<int>& indices, 
                Matrix<DistanceType>& dists, 
                float radius, 
                const SearchParams& params);
        int radiusSearch(const Matrix<ElementType>& queries,
                            std::vector< std::vector<int> >& indices,
                            std::vector<std::vector<DistanceType> >& dists,
                            float radius,
                            const SearchParams& params);
    void save(std::string filename);
    int veclen() const;
    int size() const;
    IndexParams getParameters() const;
        flann_algorithm_t getType() const;
    };
}

The Distance functor

The distance functor is a class whose operator() computes the distance between two features. If the distance is also a kd-tree compatible distance it should also provide an accum_dist() method that computes the distance between individual feature dimensions. A typical distance functor looks like this (see the dist.h file for more examples):

距离因子是一个类,它的operator()用于计算特征之间的“距离”。如果该距离兼容k-d树(多维树),它同时会提供一个accum_dist()方法,用于计算单独维度中的距离。一个典型的距离因子计算方法如下,更多请参考dist.h

    template<class T>
    struct L2
    {
    
        typedef bool is_kdtree_distance;//判断是否为多维树
        typedef T ElementType;//元素的类型和模板类型一致
        typedef typename Accumulator<T>::Type ResultType; //
        template <typename Iterator1, typename Iterator2>
        ResultType operator()(Iterator1 a, Iterator2 b, size_t size, 
                              ResultType /*worst_dist*/ = -1) const
        {
    
            ResultType result = ResultType();
            ResultType diff;
            for(size_t i = 0; i < size; ++i ) {
    
                diff = *a++ - *b++;//对应位置上的a-b为差值
                result += diff*diff;//a-b的平方和为距离
            }
            return result;
        }
        template <typename U, typename V>
        inline ResultType accum_dist(const U& a, const V& b, int) const
        {
    
            return (a-b)*(a-b);
        }
    };

In addition to operator() and accum_dist(), a distance functor
should also define the ElementType and the ResultType as the types
of the elements it operates on and the type of the result it computes.

程序还应该指明元素的类型 ElementTypeResultType

If a distance functor can be used as a kd-tree distance (meaning that the full distance between a pair of features can be accumulated from the partial distances between the individual dimensions) a typedef is_kdtree_distance should be present inside the distance functor. If the distance is not a kd-tree distance, but it’s a distance in a vector space (the individual dimensions of the elements it operates on can be accessed independently) a typedef is_vector_space_distance should be defined inside the functor. If neither typedef is defined, the distance is assumed to be a metric distance and will only be used with indexes operating on generic metric distances.\

如果一个距离因子是kdtree的距离,也就是说,一对特征之间的总距离可以从各个独立维度的微分距离累加得到,那么变量is_kdtree_distance应该存在;如果某个距离因子不是kdtree的距离,而是一个向量空间的距离,(操作的每个独立的维度都可以独立获取),is_vector_space_distance变量应该定义。如果这两者都没有被定义,这个距离会被认为是一个度量距离,只会用通用的度量距离算子进行处理。

flann::Index::Index Constructs a nearest neighbor search index for a given dataset.

    Index(const IndexParams& params, Distance distance = Distance() );
    Index(const Matrix<ElementType>& points, const IndexParams& params,
            Distance distance = Distance() );
points 需要建立索引的特征形成的矩阵,每行一个特征,矩阵大小是特征数×维度。
:   Matrix containing the features(points) that should be indexed,
    stored in a row-major order (one point on each row of the matrix).
    The size of the matrix is $num\_features \times dimensionality$.

params 包含索引参数的结构,构建索引的类型取决于这个参数,可用的类型有:
:   Structure containing the index parameters. The type of index that
    will be constructed depends on the type of this parameter. The
    possible parameter types are:

LinearIndexParams 传递一个该类型的对象时,索引会做线性暴力搜索。
When passing an object of this type, the index will perform a linear, brute-force search.

    struct LinearIndexParams : public IndexParams 
    {
    
    };

KDTreeIndexParams 传递一个该类型的对象时,索引将由一系列随机的kdtrees并行搜索。
When passing an object of this type the index constructed will consist of a set of randomized kd-trees which will be searched in parallel.

    struct KDTreeIndexParams : public IndexParams
    {
    
            KDTreeIndexParams( int trees = 4 );
    };
trees 并行kdtree的数量,好的取值在1-16之间。
:   The number of parallel kd-trees to use. Good values are in the
    range [1..16]

KMeansIndexParams 传递一个该类型的对象时,索引将被构建成一个层次k均值树。
k均值聚类:以空间中k个点为中心进行聚类,对最靠近他们的对象归类。通过迭代的方法,逐次更新各聚类中心的值,直至得到最好的聚类结果。
When passing an object of this type the index constructed will be a hierarchical k-means tree.

    struct KMeansIndexParams : public IndexParams
    {
    
        KMeansIndexParams( int branching = 32,
                int iterations = 11,
                flann_centers_init_t centers_init = FLANN_CENTERS_RANDOM,
                float cb_index = 0.2 );
    };
branching 分支数
:   <span> The branching factor to use for the hierarchical k-means
    tree </span>

iterations 聚类过程中的最大迭代数。如果是-1,聚类将一直进行到收敛。
:   <span> The maximum number of iterations to use in the k-means
    clustering stage when building the k-means tree. If a value of
    -1 is used here, it means that the k-means clustering should be
    iterated until convergence</span>

centers\_init 进行k均值聚类的时候,挑选初始中心点的算法。可以用的参数有:
    CENTERS\_RANDOM 随机选
    CENTERS\_GONZALES Gonzales算法
    CENTERS\_KMEANSPP Kmeanspp算法
:   <span> The algorithm to use for selecting the initial centers
    when performing a k-means clustering step. The possible values
    are CENTERS\_RANDOM (picks the initial cluster centers
    randomly), CENTERS\_GONZALES (picks the initial centers using
    Gonzales’ algorithm) and CENTERS\_KMEANSPP (picks the initial
    centers using the algorithm suggested in @arthur_kmeanspp_2007)
    </span>

cb\_index 聚类边界系数:该参数影响分层k均值树中“探索”进行的方式。如果是0,
    下一个探索的区域就是最近中心点的区域;大于0的值,会把区域的大小也考虑在内。
:   <span> This parameter (cluster boundary index) influences the
    way exploration is performed in the hierarchical kmeans tree.
    When `cb_index` is zero the next kmeans domain to be explored is
    choosen to be the one with the closest center. A value greater
    then zero also takes into account the size of the domain.</span>

CompositeIndexParams 组合索引,用这种索引时,随机kdtree和层次k均值树会被组合使用。
When using a parameters object of this type the index created combines the randomized kd-trees and the hierarchical k-means tree.

    struct CompositeIndexParams : public IndexParams
    {
    
            CompositeIndexParams( int trees = 4,
                                int branching = 32,
                                int iterations = 11,
                                flann_centers_init_t centers_init = FLANN_CENTERS_RANDOM, 
                                float cb_index = 0.2 );
    };

KDTreeSingleIndexParams 传递一个该类型的对象时,索引将会包含一个单独的kdtree,用于搜索低维度的数据,例如3d点云。
When passing an object of this type the index will contain a single kd-tree optimized for searching lower dimensionality data (for example 3D point clouds)

    struct KDTreeSingleIndexParams : public IndexParams
    {
    
            KDTreeSingleIndexParams( int leaf_max_size = 10 );
    };
max\_leaf\_size 每片叶子上包含点的最大数量。
:   The maximum number of points to have in a leaf for not branching
    the tree any more.

KDTreeCuda3dIndexParams 传递一个该类型的对象时,索引将会包含一个单独的kdtree,并且在CUDA上运算。大量搜索和查询点时,搜索表现最好。
When passing an object of this type the index will be a single kd-tree that is built and performs searches on a CUDA compatible GPU. Search performance is best for large numbers of search and query points. For more information, see section [sec:flann::cuda]

    struct KDTreeCuda3dIndexParams : public IndexParams
    {
    
        KDTreeCuda3dIndexParams( int leaf_max_size = 64 );
    };
max\_leaf\_size 每片叶子上包含点的最大数量。
:   The maximum number of points to have in a leaf for not branching
    the tree any more.

HierarchicalClusteringIndexParams 传递一个该类型的对象时,索引类型会是一个层次聚类索引。这种索引适用于任何度量距离,并且可以用汉明距离匹配二进制特征。
When passing an object of this type the index constructed will be a hierarchical clustering index. This type of index works with any metric distance and can be used for matching binary features using Hamming distances.

    struct HierarchicalClusteringIndexParams : public IndexParams
    {
    
        HierarchicalClusteringIndexParams(int branching = 32,
                                    flann_centers_init_t centers_init = FLANN_CENTERS_RANDOM,
                                    int trees = 4, int leaf_max_size = 100)
    };
branching 分支数
:   <span> The branching factor to use for the hierarchical
    clustering tree </span>

centers\_init 中心点选取算法
:   <span> The algorithm to use for selecting the initial centers
    when performing a k-means clustering step. The possible values
    are CENTERS\_RANDOM (picks the initial cluster centers
    randomly), CENTERS\_GONZALES (picks the initial centers using
    Gonzales’ algorithm) and CENTERS\_KMEANSPP (picks the initial
    centers using the algorithm suggested in @arthur_kmeanspp_2007)
    </span>

trees 并行树,最好在3-8.
:   The number of parallel trees to use. Good values are in the
    range [3..8]

leaf\_size 叶片大小
:   The maximum number of points a leaf node should contain.

LshIndexParams 传递一个该类型的对象时,索引将会是一种多探寻的局部敏感哈希方法,只适用于汉明距离匹配二进制特征。
When passing an object of this type the index constructed will be a multi-probe LSH (Locality-Sensitive Hashing) index. This type of index can only be used for matching binary features using Hamming distances.

    struct LshIndexParams : public IndexParams
    {
    
        LshIndexParams(unsigned int table_number = 12, 
                        unsigned int key_size = 20, 
                        unsigned int multi_probe_level = 2);
    };
table\_number 使用哈希表的数量
:   <span> The number of hash tables to use </span>

key\_size 哈希表键的长度
:   <span> The length of the key in the hash tables</span>

multi\_probe\_level 多探寻中用到的级别,0是标准LSH
:   Number of levels to use in multi-probe (0 for standard LSH)

AutotunedIndexParams 传递一个该类型的对象时,自动选择最佳的索引类型和参数(随机kdtree,层次k均值,线性)
When passing an object of this type the index created is automatically tuned to offer the best performance, by choosing the optimal index type (randomized kd-trees, hierarchical kmeans, linear) and parameters for the dataset provided.

    struct AutotunedIndexParams : public IndexParams
    {
    
        AutotunedIndexParams( float target_precision = 0.9,
                    float build_weight = 0.01,
                    float memory_weight = 0,
                    float sample_fraction = 0.1 );
    };
target\_precision 0-1之间表征近似最临近搜索返回值是真正最临近值的百分比,
    此值越大,结果越精确,但搜索时间更长。最佳值一般取决于应用场景。
:   <span> Is a number between 0 and 1 specifying the percentage of
    the approximate nearest-neighbor searches that return the exact
    nearest-neighbor. Using a higher value for this parameter gives
    more accurate results, but the search takes longer. The optimum
    value usually depends on the application. </span>

build\_weight 表征索引构建时间相比于最临近搜索时间重要性的比例。有些应用场合,
    构建索引花费很长时间,随后的搜索非常快是允许的;另一些场景中要求索引被快速建立,
    即使这会使搜索时间更长。
:   <span> Specifies the importance of the index build time raported
    to the nearest-neighbor search time. In some applications it’s
    acceptable for the index build step to take a long time if the
    subsequent searches in the index can be performed very fast. In
    other applications it’s required that the index be build as fast
    as possible even if that leads to slightly longer search
    times.</span>

memory\_weight 表征索引建立和搜索所需时间和消耗内存之间的权重关系,
    小于1的值更在意时间,大于1的值更在意内存消耗。
:   <span>Is used to specify the tradeoff between time (index build
    time and search time) and memory used by the index. A value less
    than 1 gives more importance to the time spent and a value
    greater than 1 gives more importance to the memory usage.</span>

sample\_fraction 采样比例:0-1之间表征自动参数认证算法中使用的数据集的比例大小。
    当数据集过大时,耗时会非常久,用一部分表示整体也能得出很好的评估结果。
:   <span>Is a number between 0 and 1 indicating what fraction of
    the dataset to use in the automatic parameter configuration
    algorithm. Running the algorithm on the full dataset gives the
    most accurate results, but for very large datasets can take
    longer than desired. In such case using just a fraction of the
    data helps speeding up this algorithm while still giving good
    approximations of the optimum parameters.</span>

SavedIndexParams 从硬盘读取一个之前保存好的索引。
This object type is used for loading a previously saved index from the disk.

    struct SavedIndexParams : public IndexParams
    {
    
            SavedIndexParams( std::string filename );
    };
filename 文件名字
:   <span> The filename in which the index was saved. </span>

flann::Index::buildIndex

构建最临近搜索索引,有两个重载,一个以点为参数,另一个以对象创建时提供的点为参数。
Builds the nearest neighbor search index. There are two versions of the
buildIndex method, one that uses the points provided as argument and
one that uses the points provided to the constructor when the object was
constructed.

    void buildIndex();        
    void buildIndex(const Matrix<ElementType>& points);

flann::Index::addPoints

addPoints添加点的方法,用于向建好的索引中加点,为了防止索引失衡,有阈值参数控制何时重建索引,默认为2.
The addPoints method can be used to incrementally add points to the
index after the index was build. To avoid the index getting unbalanced,
the addPoints method has the option of rebuilding the index after a
large number of points have been added. The rebuild_threshold
parameter controls when the index is rebuild, by default this is done
when it doubles in size (rebuild_threshold = 2).

    void addPoints(const Matrix<ElementType>& points, float rebuild_threshold = 2);

flann::Index::removePoint

removePoint移除点的方法,留下的点的索引号不会改变。
The removePoint method removes one point with the specified point_id
from the index. The indices (of the remaining points) returned by the
nearest neighbor operations do not change when points are removed from
the index.

    void removePoint(size_t point_id);

flann::Index::getPoint

getPoint返回一个点的id号,指向数据中的一个点。
The getPoint method returns a pointer to the data point with the specified point_id.

    ElementType* getPoint(size_t point_id);

flann::Index::knnSearch

进行k最近邻搜索,就是指最接近的k个邻居(数据),即每个样本都可以由它的K个邻居来表达。这个方法有两个重载,一个是预先分配内存的flann::Matrix对象,用于存放返回的索引号和距离;另一个是两层向量,会自动调整大小。
Performs a K-nearest neighbor search for a set of query points. There
are two signatures for this method, one that takes pre-allocated
flann::Matrix objects for returning the indices of and distances to
the neighbors found, and one that takes std::vector<std::vector> that
will re resized automatically as needed.

    int Index::knnSearch(const Matrix<ElementType>& queries,
            Matrix<int>& indices, 
            Matrix<DistanceType>& dists,
            size_t knn,
            const SearchParams& params);
    int Index::knnSearch(const Matrix<ElementType>& queries,
                    std::vector< std::vector<int> >& indices,
                    std::vector<std::vector<DistanceType> >& dists,
                    size_t knn,
                    const SearchParams& params);
queries 包含查询点的矩阵,大小是查询点数×维度。
:   <span>Matrix containing the query points. Size of matrix is
    ($num\_queries \times dimentionality $)</span>

indices 包含找到的k临近结果的索引号,对预分配版本,最少要查询点数×knn的值。
:   <span>Matrix that will contain the indices of the K-nearest
    neighbors found (size should be at least $num\_queries \times knn$
    for the pre-allocated version).</span>

dists 包含找到的k临近结果的距离信息,对预分配版本,最少要查询点数×knn的值。
    注意:对于欧式距离,`flann::L2`计算的是平方距离,所以传到此处的值也应该是平方值。
:   <span>Matrix that will contain the distances to the K-nearest
    neighbors found (size should be at least $num\_queries \times knn$
    for the pre-allocated version).\
    *Note:* For Euclidean distances, the `flann::L2` functor computes
    squared distances, so the value passed here needs to be a squared
    distance as well. </span>

knn 最临近搜索要找的点的数量
:   <span>Number of nearest neighbors to search for.</span>

params 搜索参数,搜索中要用的参数的结构。
:   <span>Search parameters.</span> Structure containing parameters used
    during search.

SearchParameters 搜索参数

    struct SearchParams
    {
    
        SearchParams(int checks = 32,
            float eps = 0,
            bool sorted = true);
        int checks;
        float eps;
        bool sorted;
        int max_neighbors;
        tri_type use_heap;
        int cores;
        bool matrices_in_gpu_ram;
    };
checks 搜索临近点时最大叶子的数量。该值越大,搜索精度越高,耗时越久。所有叶片都搜到,用
`FLANN_CHECKS_UNLIMITED`;如果是自动认证的情况,达到预期精度所需checks的数量也会被计算,
想用该值的话,设为`FLANN_CHECKS_AUTOTUNED`。
:   specifies the maximum leafs to visit when searching for
    neighbours. A higher value for this parameter would give better
    search precision, but also take more time. For all leafs to be
    checked use the value `FLANN_CHECKS_UNLIMITED`. If automatic
    configuration was used when the index was created, the number of
    checks required to achieve the specified precision was also
    computed, to use that value specify `FLANN_CHECKS_AUTOTUNED`.

eps 搜索 eps-大致临近的点,大概是epsilon的意思?只用于KDTreeSingleIndex/KDTreeCuda3dIndex。
:   Search for eps-approximate neighbors (only used by
    KDTreeSingleIndex and KDTreeCuda3dIndex).

sorted 只用于半径搜索,表征返回的临近点是否按距离排序。
:   Used only by radius search, specifies if the neighbors returned
    should be sorted by distance.

max\_neighbours 表征半径搜索返回临近值的最大数量,仅用于半径搜索。
:   Specifies the maximum number of neighbors radius search should
    return (default: -1 = unlimited). Only used for radius search.

use\_heap 用堆排序。用堆的数据结构来管理内部的临近列表(可选值 FLANN\_False, FLANN\_True,
    FLANN\_Undefined)。堆对于大量临近点更高效,对少量值的效果不明显。默认值是Undefined,
    让代码根据临近点的数量自己选择最佳值。该选项只用与knn搜索。
:   Use a heap data structure to manage the list of neighbors
    internally (possible values: FLANN\_False, FLANN\_True,
    FLANN\_Undefined). A heap is more efficient for a large number
    of neighbors and less efficient for a small number of neighbors.
    Default value is FLANN\_Undefined, which lets the code choose
    the best option depending on the number of neighbors requested.
    Only used for KNN search.

cores 搜索中内核的数量。
:   How many cores to assign to the search (specify 0 for automatic
    core selection).

matrices\_in\_gpu\_ram GPU搜索中,表征矩阵是否已经在GPU内存中。
:   for GPU search indicates if matrices are already in GPU ram.

flann::Index::radiusSearch

进行半径最近邻搜索。这个方法有两个重载,一个是预先分配内存的flann::Matrix对象,用于存放返回的索引号和距离;另一个是两层向量,会自动调整大小。
Performs a radius nearest neighbor search for a set of query points.
There are two signatures for this method, one that takes pre-allocated
flann::Matrix objects for returning the indices of and distances to
the neighbors found, and one that takes std::vector<std::vector> that
will resized automatically as needed.

    int Index::radiusSearch(const Matrix<ElementType>& queries,
              Matrix<int>& indices,
              Matrix<DistanceType>& dists,
              float radius,
              const SearchParams& params); 
    int Index::radiusSearch(const Matrix<ElementType>& queries,
                      std::vector< std::vector<int> >& indices,
                      std::vector<std::vector<DistanceType> >& dists,
                      float radius,
                      const SearchParams& params);
queries
The query point. Size of matrix is ($num_queries \times dimentionality $).
indices
Matrix that will contain the indices of the K-nearest
neighbors found. For the pre-allocated version, only as many
neighbors are returned as many columns in this matrix. If fewer
neighbors are found than columns in this matrix, the element after
that last index returned is -1. In case of the std::vector version,
the rows will be resized as needed to fit all the neighbors to be
returned, except if the “max_neighbors” search parameter is
set.
dists
Matrix that will contain the distances to the K-nearest
neighbors found. The same number of values are returned here as for
the indices matrix.
Note: For Euclidean distances, the flann::L2 functor computes
squared distances, so the value passed here needs to be a squared
distance as well.
radius
The search radius.
Note: For Euclidean distances, the flann::L2 functor computes
squared distances, so the value passed here needs to be a squared
distance as well.
params
Search parameters.

The method returns the number of nearest neighbors found.

flann::Index::save

保存索引到一个文件
Saves the index to a file.

    void Index::save(std::string filename);
filename 文件名字
The file to save the index to

flann::hierarchicalClustering

多层聚类:将给定的点构造成一个层次k均值树,并且选择其中使聚类方差最小的一支cut。
Clusters the given points by constructing a hierarchical k-means tree
and choosing a cut in the tree that minimizes the clusters’ variance.

    template <typename Distance>
    int hierarchicalClustering(const Matrix<typename Distance::ElementType>& points, 
        Matrix<typename Distance::ResultType>& centers,
        const KMeansIndexParams& params,
        Distance d = Distance())
features 要被聚类的点
:   <span>The points to be clustered</span>

centers 聚类得到的中心点。这个矩阵中的行数表征了所需聚类数量。然而,基于层次树中分支的选择,计算得到的聚类数量会是
$(branching-1)*k+1$ 的最大值,该值会比所求聚类数小。$branching$是分支数。
:   <span>The centers of the clusters obtained. The number of rows in
    this matrix represents the number of clusters desired. However,
    because of the way the cut in the hierarchical tree is choosen, the
    number of clusters computed will be the highest number of the form
    $(branching-1)*k+1$ that’s lower than the number of clusters
    desired, where $branching$ is the tree’s branching factor (see
    description of the KMeansIndexParams). </span>

params
:   <span>Parameters used in the construction of the hierarchical
    k-means tree</span>

该方法的返回值是计算得到聚类的数量。
The function returns the number of clusters computed.

flann::KdTreeCuda3dIndex

提供CUDA接口,用于大量3d数据集的提速。
FLANN provides a CUDA implementation of the kd-tree build and search
algorithms to improve the build and query speed for large 3d data sets.
This section will provide all the necessary information to use the
KdTreeCuda3dIndex index type.

编译:如果CMake检测到CUDA模块,libflann_cuda.so库将会被编译。
Building: If CMake detects a CUDA install during the build (see
section [sec:downloadingandcompiling]), a library libflann_cuda.so
will be built.

基本用法:
Basic Usage: To be able to use the new index type, you have to include the FLANN header this way:

    #define FLANN_USE_CUDA
    #include <flann/flann.hpp>

如果在头文件之前定义了FLANN_USE_CUDA,你需要手动链接libflann_cuda.so到你的项目。这样的好处是你不用编译的时候加上nvcc参数了,除非你还要用其它的CUDA代码。
If you define the symbol FLANN_USE_CUDA before including the FLANN
header, you will have to link libflann_cuda.so or libflann_cuda_s.a
with your project. However, you will not have to compile your source
code with nvcc, except if you use other CUDA code, of course.

然后,可以用KDTreeCuda3dIndexParams方法创建索引,这个索引会负责所有索引创建和搜索的数据拷贝工作。
You can then create your index by using the KDTreeCuda3dIndexParams to
create the index. The index will take care of copying all the data from
and to the GPU for you, both for index creation and search.

当心:关于选用CUDA还是多线程CPU进行kdtree运算,很难给出一条通用的指导意见。最好是尝试一下,看看效果决定。
A word of caution: A general guideline for deciding whether to use the
CUDA kd tree or a (multi-threaded) CPU implementation is hard to give,
since it depends on the combination of CPU and GPU in each system and
the data sets. For example, on a system with a Phenom II 955 CPU and a
Geforce GTX 260 GPU, the maximum search speedup on a synthetic (random)
data set is a factor of about 8-9 vs the single-core CPU search, and
starts to be reached at about 100k search and query points. (This
includes transfer times.) Build time does not profit as much from the
GPU acceleration; here the benefit is about 2x at 200k points, but this
largely depends on the data set. The only way to know which
implementation is best suited is to try it.

高级用法:一些情况下,GPU里已经有了缓存的数据,这时可以直接用缓存,而不必再从系统缓存中拷贝。具体的实现方法是,把GPU的指针传给flann::Matrix的输入,并且告诉索引按照GPU的指针去处理。
Advanced Usage: In some cases, you might already have your data in a
buffer on the GPU. In this case, you can re-use these buffers instead of
copying the buffers back to system RAM for index creation and search.
The way to do this is to pass GPU pointers via the flann::Matrix
inputs and tell the index via the index and search params to treat the
pointers as GPU pointers.

    thrust::device_vector<float4>  points_vector( n_points );
    // ... fill vector here...
    float* gpu_pointer = (float*)thrust::raw_pointer_cast(&points_vector[0]);
    flann::Matrix<float> matrix_gpu(gpu_pointer,n_points,3, 4);
    flann::KDTreeCuda3dIndexParams params;
    params["input_is_gpu_float4"]=true;
    flann::Index<flann::L2<float> > flannindex( matrix_gpu, params  );
    flannindex.buildIndex();

首先,创建并填充一个格式为float4的GPU缓存
First, a GPU buffer of float4 values is created and filled with points.[^2]

然后,把一个指向该缓存的GPU指针存放在一个FLANN的矩阵中,3列,步长为4(·float4的最后一个元素应当为0
Then, a GPU pointer to the buffer is stored in a flann matrix with 3 columns and a stride of 4 (the last element in the float4 should be zero).

最后,创建索引,input_is_gpu_float4标识表明以GPU缓存的形式去处理矩阵。
Last, the index is created. The input_is_gpu_float4 flag tells the index to treat the matrix as a gpu buffer.

同样的,返回值为flann矩阵的搜索过程也可以用GPU缓存来处理,但返回值为向量的不可以。要实现这种效果,索引中的指针和距离矩阵一定要指向GPU缓存,并且cols的值必须设为3(因为我们在处理三维点)。这里的步长值stride可以任意设置。

Similarly, you can specify GPU buffers for the search routines that
return the result via flann matrices (but not for those that return them
via std::vectors). To do this, the pointers in the index and dist
matrices have to point to GPU buffers and the cols value has to be set
to 3 (as we are dealing with 3d points). Here, any value for stride
can be used.

    flann::Matrix<int> indices_gpu(gpu_pointer_indices,n_points, knn, stride);
    flann::Matrix<float> dists_gpu(gpu_pointer_dists,n_points, knn, stride);
    flann::SearchParams params;
    params.matrices_in_gpu_ram = true;
    flannindex.knnSearch( queries_gpu ,indices_gpu,dists_gpu,knn, params);

注意:这里千万不能混用CPU里的矩阵信息。
Note that you cannot mix matrices in system and CPU ram here!

Search Parameters: 搜索参数
The search routines support three parameters:搜索支持以下参数:

  • eps - used for approximate knn search. The maximum possible error
    is e = d b e s t ∗ e p s e= d_{best} * eps e=dbesteps, i.e. the distance of the returned neighbor
    is at maximum e p s eps eps times larger than the distance to the real best
    neighbor.

    用于大致k临近搜索。最大可能误差值是 e = d b e s t ∗ e p s e= d_{best} * eps e=dbesteps,也就是返回的临近点的距离最多也就比实际最短距离多出了 e e e

  • use_heap - used in knn and radius search. If set to true, a heap
    structure will be used in the search to keep track of the distance
    to the farthest neighbor. Beneficial with about k > 64 k>64 k>64 elements.

    knn和半径搜索中用的参数,堆结构。堆结构主要用来跟踪最远临近点,元素数多于64比较有益。

  • sorted - if set to true, the results of the radius and knn
    searches will be sorted in ascending order by their distance to the
    query point.

    设为true时,半径搜索和knn搜索会按照到查询点距离的升序排列

  • matrices_in_gpu_ram - set to true to indicate that all (input and
    output) matrices are located in GPU RAM.

    设为true表示所有矩阵信息都是在GPU内存里的。

版权声明:本文为博主原创文章,遵循 CC 4.0 BY-SA 版权协议,转载请附上原文出处链接和本声明。
本文链接:https://blog.csdn.net/weixin_45687825/article/details/110881552

智能推荐

C#连接OPC C#上位机链接PLC程序源码 1.该程序是通讯方式是CSharp通过OPC方式连接PLC_c#opc通信-程序员宅基地

文章浏览阅读565次。本文主要介绍如何使用C#通过OPC方式连接PLC,并提供了相应的程序和学习资料,以便读者学习和使用。OPC服务器是一种软件,可以将PLC的数据转换为标准的OPC格式,允许其他软件通过标准接口读取或控制PLC的数据。此外,本文还提供了一些学习资料,包括OPC和PLC的基础知识,C#编程语言的教程和实例代码。这些资料可以帮助读者更好地理解和应用本文介绍的程序。1.该程序是通讯方式是CSharp通过OPC方式连接PLC,用这种方式连PLC不用考虑什么种类PLC,只要OPC服务器里有的PLC都可以连。_c#opc通信

Hyper-V内的虚拟机复制粘贴_win10 hyper-v ubuntu18.04 文件拷贝-程序员宅基地

文章浏览阅读1.6w次,点赞3次,收藏10次。实践环境物理机:Windows10教育版,操作系统版本 17763.914虚拟机:Ubuntu18.04.3桌面版在Hyper-V中的刚安装好Ubuntu虚拟机之后,会发现鼠标滑动很不顺畅,也不能向虚拟机中拖拽文件或者复制内容。在VMware中,可以通过安装VMware tools来使物理机和虚拟机之间达到更好的交互。在Hyper-V中,也有这样的工具。这款工具可以完成更好的鼠标交互,我的..._win10 hyper-v ubuntu18.04 文件拷贝

java静态变量初始化多线程,持续更新中_类初始化一个静态属性 为线程池-程序员宅基地

文章浏览阅读156次。前言互联网时代,瞬息万变。一个小小的走错,就有可能落后于别人。我们没办法去预测任何行业、任何职业未来十年会怎么样,因为未来谁都不能确定。只能说只要有互联网存在,程序员依然是个高薪热门行业。只要跟随着时代的脚步,学习新的知识。程序员是不可能会消失的,或者说不可能会没钱赚的。我们经常可以听到很多人说,程序员是一个吃青春饭的行当。因为大多数人认为这是一个需要高强度脑力劳动的工种,而30岁、40岁,甚至50岁的程序员身体机能逐渐弱化,家庭琐事缠身,已经不能再进行这样高强度的工作了。那么,这样的说法是对的么?_类初始化一个静态属性 为线程池

idea 配置maven,其实不用单独下载Maven的。以及设置新项目配置,省略每次创建新项目都要配置一次Maven_安装idea后是不是不需要安装maven了?-程序员宅基地

文章浏览阅读1w次,点赞13次,收藏43次。说来也是惭愧,一直以来,在装环境的时候都会从官网下载Maven。然后再在idea里配置Maven。以为从官网下载的Maven是必须的步骤,直到今天才得知,idea有捆绑的 Maven 我们只需要搞一个配置文件就行了无需再官网下载Maven包以后再在新电脑装环境的时候,只需要下载idea ,网上找一个Maven的配置文件 放到 默认的 包下面就可以了!也省得每次创建项目都要重新配一次Maven了。如果不想每次新建项目都要重新配置Maven,一种方法就是使用默认的配置,另一种方法就是配置 .._安装idea后是不是不需要安装maven了?

奶爸奶妈必看给宝宝摄影大全-程序员宅基地

文章浏览阅读45次。家是我们一生中最重要的地方,小时候,我们在这里哭、在这里笑、在这里学习走路,在这里有我们最真实的时光,用相机把它记下吧。  很多家庭在拍摄孩子时有一个看法,认为儿童摄影团购必须是在风景秀丽的户外,即便是室内那也是像大酒店一样...

构建Docker镜像指南,含实战案例_rocker/r-base镜像-程序员宅基地

文章浏览阅读429次。Dockerfile介绍Dockerfile是构建镜像的指令文件,由一组指令组成,文件中每条指令对应linux中一条命令,在执行构建Docker镜像时,将读取Dockerfile中的指令,根据指令来操作生成指定Docker镜像。Dockerfile结构:主要由基础镜像信息、维护者信息、镜像操作指令、容器启动时执行指令。每行支持一条指令,每条指令可以携带多个参数。注释可以使用#开头。指令说明FROM 镜像 : 指定新的镜像所基于的镜像MAINTAINER 名字 : 说明新镜像的维护(制作)人,留下_rocker/r-base镜像

随便推点

毕设基于微信小程序的小区管理系统的设计ssm毕业设计_ssm基于微信小程序的公寓生活管理系统-程序员宅基地

文章浏览阅读223次。该系统将提供便捷的信息发布、物业报修、社区互动等功能,为小区居民提供更加便利、高效的服务。引言: 随着城市化进程的加速,小区管理成为一个日益重要的任务。因此,设计一个基于微信小程序的小区管理系统成为了一项具有挑战性和重要性的毕设课题。本文将介绍该小区管理系统的设计思路和功能,以期为小区提供更便捷、高效的管理手段。四、总结与展望: 通过本次毕设项目,我们实现了一个基于微信小程序的小区管理系统,为小区居民提供了更加便捷、高效的服务。通过该系统的设计与实现,能够提高小区管理水平,提供更好的居住环境和服务。_ssm基于微信小程序的公寓生活管理系统

如何正确的使用Ubuntu以及安装常用的渗透工具集.-程序员宅基地

文章浏览阅读635次。文章来源i春秋入坑Ubuntu半年多了记得一开始学的时候基本一星期重装三四次=-= 尴尬了 觉得自己差不多可以的时候 就吧Windows10干掉了 c盘装Ubuntu 专心学习. 这里主要来说一下使用Ubuntu的正确姿势Ubuntu(友帮拓、优般图、乌班图)是一个以桌面应用为主的开源GNU/Linux操作系统,Ubuntu 是基于DebianGNU/Linux,支..._ubuntu安装攻击工具包

JNI参数传递引用_jni引用byte[]-程序员宅基地

文章浏览阅读335次。需求:C++中将BYTE型数组传递给Java中,考虑到内存释放问题,未采用通过返回值进行数据传递。public class demoClass{public native boolean getData(byte[] tempData);}JNIEXPORT jboolean JNICALL Java_com_core_getData(JNIEnv *env, jobject thisObj, jbyteArray tempData){ //resultsize为s..._jni引用byte[]

三维重建工具——pclpy教程之点云分割_pclpy.pcl.pointcloud.pointxyzi转为numpy-程序员宅基地

文章浏览阅读2.1k次,点赞5次,收藏30次。本教程代码开源:GitHub 欢迎star文章目录一、平面模型分割1. 代码2. 说明3. 运行二、圆柱模型分割1. 代码2. 说明3. 运行三、欧几里得聚类提取1. 代码2. 说明3. 运行四、区域生长分割1. 代码2. 说明3. 运行五、基于最小切割的分割1. 代码2. 说明3. 运行六、使用 ProgressiveMorphologicalFilter 分割地面1. 代码2. 说明3. 运行一、平面模型分割在本教程中,我们将学习如何对一组点进行简单的平面分割,即找到支持平面模型的点云中的所有._pclpy.pcl.pointcloud.pointxyzi转为numpy

以NFS启动方式构建arm-linux仿真运行环境-程序员宅基地

文章浏览阅读141次。一 其实在 skyeye 上移植 arm-linux 并非难事,网上也有不少资料, 只是大都遗漏细节, 以致细微之处卡壳,所以本文力求详实清析, 希望能对大家有点用处。本文旨在将 arm-linux 在 skyeye 上搭建起来,并在 arm-linux 上能成功 mount NFS 为目标, 最终我们能在 arm-linux 里运行我们自己的应用程序. 二 安装 Sky..._nfs启动 arm

攻防世界 Pwn 进阶 第二页_pwn snprintf-程序员宅基地

文章浏览阅读598次,点赞2次,收藏5次。00为了形成一个体系,想将前面学过的一些东西都拉来放在一起总结总结,方便学习,方便记忆。攻防世界 Pwn 新手攻防世界 Pwn 进阶 第一页01 4-ReeHY-main-100超详细的wp1超详细的wp203 format2栈迁移的两种作用之一:栈溢出太小,进行栈迁移从而能够写入更多shellcode,进行更多操作。栈迁移一篇搞定有个陌生的函数。C 库函数 void *memcpy(void *str1, const void *str2, size_t n) 从存储区 str2 _pwn snprintf

推荐文章

热门文章

相关标签