DocuProc
![Process non-OCR pdfs with ease into parsable json Process non-OCR pdfs with ease into parsable json](/projects/docuproc/featured-image.webp)
Problem
PDF is a robust type of document, but often the source material is not very searchable. With a massive set of documents, it is difficult to index and process a massive set of PDFs, especially if they are non-OCR.
Solution
Automated process which locally runs OCR on PDFs, and indexes all information into a searchable database.