You can run and correct the OCR (Optical Character Recognition) process for PDFs in Adobe Acrobat Pro DC. The OCR process creates searchable text, an important accessibility feature for digital documents. This is potentially useful if the Ally tool that's built into Canvas is unable to generate an "OCRed PDF" version of your document in its Alternative Formats menu.
You can download Adobe Acrobat Pro DC for free as part of Adobe Creative Cloud.
Here are the steps to running and correcting OCR in Acrobat:
- Open the PDF in Adobe Acrobat Pro DC.
- Click Scan and OCR in the righthand panel:
- Click Recognize Text in the top toolbar that appears.
- Click In This File in the dropdown menu.
- Click Settings in the toolbar that appears.
For “Output”: if file size matters, select “Searchable Image” and choose “600 dpi” for “Downsample To.” This will reduce the file size after running OCR.
If file size doesn’t matter, select “Searchable Image (Exact).” This will result in the highest fidelity to the original document after running OCR. - Click the blue Recognize Text button to initiate OCR.
- After the process finishes, click Recognize Text in the top toolbar again.
- Click Correct Recognized Text in the dropdown menu.
- Any text that Acrobat suspects might be misrecognized will appear in red boxes. Click on any text in a red box to see a comparison between the text’s image and what it is being recognized as. Type in the recognized as box to correct as needed, then click Accept to save your edit:
- There may be recognition errors that are not in red boxes. If you check the box for Review recognized text at top left, Acrobat will display the searchable text layer instead of the page image. Doing this will not change your document's appearance to readers. You can then double-click on any word to place a red box around it and correct its recognition. Unchecking the Review box or clicking Cancel will return your display to the page image.
- Continue this process until all text is recognized correctly. You can ignore any red boxed text that is already recognized correctly. Save the PDF to finish. It’s recommended to use Save As to create a new version of the file.
If your document has too many recognition errors to fix, you may need to find a born-digital version (check the Iwasaki Library) or create a higher-quality scan. If neither option is possible, you can submit the library's Reserves Request Form.