Apache Lucene - The Powerful Text Search Library

October 01, 2013

Apache Lucene - The Powerful Text Search Library

Features:

Full text search library
Lets you add metadata to an index
Returns ranked results for a given query sorted by the field you specify.

Why Use Lucene?

It's smarter then just searching text. For example, I mean imagine opening up your old history textbook from high-school and search each page one by one to find text. If that is tiresome then why would you make your computer waste time and have a slow runtime. Imagine now looking at the back of the book for an index with page numbers to a particular topic your looking for. With that notion Lucene doesn't just search text, it searches index's.

What Do Index's Contain?

Index's contain one or more document's and each document is a piece of metadata that describes data. If you were to index a table of a database consider each row as a single document. All the document's are then stored in an index. Document's don't just store data but rather store metadata like for example the name of a book, publisher, date of release, and more. Apache Lucene uses inverted index's to quickly search through millions of documents.

How Do You Add A Document Or Search A Document In An Index?

To perform a search you need to first add the document by adding it to the IndexWriter. Once it's added you can use an IndexSearcher to find your document.

Is Lucene Query Syntax Different?

Yes. Its syntax kind of reminds me of JSON exact no curly braces:

field: value

Note: Field can be any name of a a field found in a document for exame title, publisher, date_published and more. Value is the value stored for this particular key value pair.

If the value has multiple words or white space then you should use quotes:

field: "value value"

You can also use AND, OR, NOT, boolean, wildcard, ranges, and more features to filter your search.

Now you have a good foundation on Apache Lucene.

Search This Blog

Open Source Blog