Parsing PDFs using Tika and Nokogiri and indexing them for searching using Algolia
A step-by-step blog post I made showing how to use Apache Tika to extract text from a PDF file (it supports thousands of other files), parsing the resulting HTML with Nokogiri, then getting the resulting data easily indexed and searchable using Algolia.
Post a comment