Lucene Review

One of the best open source date indexing tools available, with support for various popular file formats like PDF, HTML, etc.

Valuable Features:

- A very good product for indexing huge data with a very fast response time for search queries. - Apart from the originally supported Java platform, it can be easily integrated with different platforms as well, including Delphi, Perl, C#, C++, Python, Ruby, and PHP. - For developers, there are very good community support like forums, mailing lists, etc., available on the web. - Being open source, developers can add/customize the underlying code-base to suit their needs. - Is capable of indexing different types of files like PDFs, HTML, Microsoft Word, etc. - Supports data indexing in UTC encoded data. Meaning, it can index any data as long as UTC supports encoding it. This is independent of any language across the globe.

Room for Improvement:

- For Java users, there is a performance penalty due to the well known fact that JVM(Java Virtual Memory) is a memory hogger. Scalability is an issue as well. - If you have a requirement of adding custom algorithms for indexing data, you might face a little difficulty, as there is not much information available either in Lucene forums or mailing lists. Though community support is excellent for Java users, for other area specific Programming Platforms like Perl, and Delphi, it is a bit difficult to get solutions for your problems, as the tool is still not that stable in these platforms and is still in the incubation phase.

Other Advice:

Having personally used Lucene for indexing html documents and pdf documents, I find it to be a very good data indexing tool. With Java as the primary platform of development, it was much easier. However, I had to struggle a bit when there was an immediate need for adding a custom indexing algorithm for a particular non-English language.
**Disclosure: I am a real user, and this review is based on my own experience and opinions.
Add a Comment