DocuProc

Process non-OCR pdfs with ease into parsable json

Problem

PDF is a robust type of document, but often the source material is not very searchable. With a massive set of documents, it is difficult to index and process a massive set of PDFs, especially if they are non-OCR.

Solution

Automated process which locally runs OCR on PDFs, and indexes all information into a searchable database.