OCR
Extract text and figures from scanned PDFs using the native desktop OCR engine.
Desktop only. OCR uses the native desktop engine and is not available in the web app.
OCR (optical character recognition) extracts readable text from scanned PDFs and image-based documents. Access it from the OCR side panel inside the PDF Viewer.
Opening the OCR panel
- Open a PDF file in the PDF Viewer.
- Click the OCR button in the viewer toolbar.
- The OCR side panel slides in from the right.
Extracting text
Click Extract Text in the OCR panel to run the engine on the current PDF. Omnilib processes each page and streams the extracted text into the panel as it finishes.
For multi-page documents, the panel shows a progress indicator and a page-by-page breakdown. Each page's extracted text is displayed in order, separated by a divider.
Tip: OCR works best on cleanly scanned documents. Low-resolution or skewed scans produce lower accuracy. If possible, scan at 300 DPI or higher before importing into Omnilib.
Figure extraction
The OCR engine identifies and extracts figures, charts, and diagrams from the PDF. Extracted figures appear as image thumbnails in the panel below the text output.
Each extracted figure is labeled with the page number it came from. Click a thumbnail to see the full-size image. Save figures individually by right-clicking a thumbnail and selecting Save Image.
Extracted figures are also available in the Literature Review Generator when generating from a research session that includes this PDF.
Using extracted text
Extracted text integrates with the rest of Omnilib:
- Global search — Extracted text is indexed and searchable from the project search bar and the command palette.
- AI context — The AI agent can read extracted text when you add the PDF as context. For scanned PDFs, OCR text is used automatically in place of the raw PDF stream.
- Copy — Select any portion of the extracted text in the panel and copy it to the clipboard.
Saving extracted text
Click Save as File in the OCR panel toolbar to save the full extracted text as a .txt file in your project. The file is named after the source PDF with a -ocr suffix.