RubyFlow The Ruby and Rails community linklog

×

The Ruby and Rails community linklog

Made a library? Written a blog post? Found a useful tutorial? Share it with the Ruby community here or just enjoy what everyone else has found!

Parsing PDFs using Tika and Nokogiri and indexing them for searching using Algolia

A step-by-step blog post I made showing how to use Apache Tika to extract text from a PDF file (it supports thousands of other files), parsing the resulting HTML with Nokogiri, then getting the resulting data easily indexed and searchable using Algolia.

Post a comment

You can use basic HTML markup (e.g. <a>) or Markdown.

As you are not logged in, you will be
directed via GitHub to signup or sign in