Parsing PDFs using Tika and Nokogiri and indexing them for searching using Algolia

by Omar Bahareth — 26 January 2017

A step-by-step blog post I made showing how to use Apache Tika to extract text from a PDF file (it supports thousands of other files), parsing the resulting HTML with Nokogiri, then getting the resulting data easily indexed and searchable using Algolia.