To evaluate the data in a PDF file, you must extract it into a spreadsheet. Learn how seven options for converting PDF to Excel fared in comparison testing using progressively more difficult data sources.
The information we need to evaluate would ideally be available in an easily usable format. But in the world we live in, a lot of important information is contained in Portable Document Format (PDF) files. How can I get the information out of PDFs and into an Excel spreadsheet? You can pick from a variety of PDF to Excel converters.
There is software from well-known manufacturers like Adobe and Microsoft, task-specific cloud services like PDFTables, general-purpose cloud services like Amazon, and even free open-source alternatives.
Which PDF to Excel converter is the "best"? Similar to the "best computer," the answer is based on your own situation.
When choosing a PDF converter, there are a number of crucial factors to take into account.
PDF files can be of two different sorts. The first is created by a program like Microsoft Word, whereas the second is derived from a scanned document or other picture file. Try to highlight some text in the page to see which one you have. Your PDF was created by an app if you can highlight text with a click and drag. If it doesn't, a scan is required. Some PDF converting software does not support scanned PDFs.
A straightforward one-page table will work with almost every tool. When tables are distributed across numerous pages, when table cells are combined, or when some of the data in a table cell spans multiple lines, things become more challenging.
If you frequently make batch conversions, our solution that produces the best app-generated PDFs might not be the ideal option for you.
Additionally, like with any program selection, you must determine how much you value performance relative to price and usability.
We tested seven PDF to Excel conversion software using four different PDF files, ranging from easy to difficult, to help you determine which is ideal for your tasks. You'll observe how each tool performs in each circumstance and learn about its advantages and disadvantages.
The tools we evaluated are listed below, starting with our top overall performers (keep in mind that "best" relies in part on the particular source document). Rankings for these tools range from "Excellent" to "Good," indicating that they all performed admirably on at least some of our tasks.
You may anticipate Adobe to do well in PDF parsing given that it developed the Portable Document Format standard, and it does. There is a fairly expensive full-featured conversion membership, but there is also a low-cost $2/month option (an annual subscription is required) that offers limitless PDF to Excel conversions. (With this utility, Microsoft Word files can also be output.)
Any text on pages with both text and tables is converted to an Excel format. This can be advantageous if you want to maintain that context or disadvantageous if you merely need the data for further research.
- Excellent — undisputed champion for non-scanned PDFs.
- $24 per year.
- Outstanding results, excellent handling of tables spanning several pages, unlimited conversions of files up to 100MB, and reasonable prices for frequent users are just a few of the pros.
- Cons: If you only convert a few documents a year, it's pricey and there isn't any built-in scripting or automation procedure.
- In conclusion, this is a wonderful option if you don't need to script or automate many conversions and don't mind paying $24 each year.
Textract's user interface is surprisingly simple for an AWS cloud service. While it is possible to set up Textract through the standard multi-step AWS setup and coding process, Amazon also provides a drag-and-drop web demo that allows you to download the results as zipped CSV files. All you have to do is create a free Amazon AWS account.
- Excellent - this was by far the finest choice we could have made for a challenging scanned PDF.
- Cost per page: 1.5 cents (100 pages per month free for your first three months at AWS)
- Pros: Provides the option of seeing results with merged or unmerged cell layout; is simple to use; is reasonably priced; was found to be the best option for a challenging scanned PDF.
- Cons: You can only upload 10 pages of files at once. Using this API is trickier than some other solutions for individuals who want to automate.
- Conclusion: If you don't mind the AWS setup and either manual upload or coding with a sophisticated API, this is a great option.
Try Tabula if you're seeking for free and open source software. Tabula is simple to install and use, in contrast to some free Python alternatives. Additionally, it features a command-line interface and a browser interface, making it suitable for both point-and-click use and batch conversions.
Despite having a problem with the difficult PDF, Tabula performed exceptionally well with PDFs of low to moderate complexity (as did many of the paid platforms). On Linux and Windows, Tabula needs a separate Java installation.
- Very nice rating, and the price is unbeatable.
- Price: Free
- Pros: Free; straightforward installation; GUI and scripting options; manual control over which parts of the website should be checked for tables; ability to export results as CSV, TSV, JSON, or script; two methods for extracting data.
- Cons: Only works with PDFs created by apps; complicated formatting required manual data cleansing.
- Conclusion: If cost, usability, and automation possibilities are important to you and your PDFs aren't scanned, this is a decent solution.
The automation of this service is a major benefit. Its API is well-documented and supports a wide range of programming languages, including Java, C++, PHP, Python, R, Windows PowerShell, and VBA (Office Visual Basic for Applications).
The majority of the PDF tables created by the apps ran smoothly thanks to PDFTables, which even recognized that a two-column header row would work best as a single-column header row. Although most of the columns were empty, it did have some issues with data in cells that were stretched across two lines. At least it didn't charge me for that, even if it choked on the scanned PDF of the nightmare.
- Overall score of very good; automation score of exceptional.
- 50 pages are free when you sign up, including use of the API. Your credits are only valid for a year after that, and it costs $40 for up to 1,000 pages.
- Pros: Excellent API; outperforms some of its paid competitors on the moderately complicated PDF.
- Cons: Expensive, especially if you use more pages than the 50 free pages but convert fewer than 1,000 pages annually. not applicable to scanned PDFs.
- Summary: Functions well and is simple to use both online and through scripting and programming. However, if you don't require an elegant API, you might favor a less expensive choice.
There are paid options on this freemium platform. It turned out to be the lone free option that could manage our scanned PDF problem.
- Score: Good.
- Cost: Free in the cloud, $5 per month or $49 per year for premium cloud service that offers faster service and batch conversions, desktop software $35 for a 30-day supply or $150 over the course of a lifetime.
- Pros: The free option is really functional, it works with scanned PDFs, and it's reasonably priced.
- Cons: There is no cloud automation or API (we didn't test the desktop program); batch conversions require a premium option; single-row data with many lines is broken into multiple rows.
- Conclusion: Good symmetry between price and features. This held true for complicated scanned PDFs best, but some performed better when cell data span numerous lines.
This website-based service is famous for its ability to convert many file formats: It can produce Excel as well as Word, PowerPoint, AutoCAD, HTML, OpenOffice, and other results. Up to five files (30MB each) may be converted with a free account per week; paying customers are entitled to an infinite number of conversions (2GB of data each day).
Cometdocs supports public service journalism and provides members of Investigative Reporters & Editors with free premium accounts (disclosure: I have one).
- Score: Good.
- Five free conversions per week; otherwise, costs are $10 per month, $70 per year, or $130 for a lifetime.
- Pro: Produces typically good results; performed remarkably well on a 2-page PDF with a sophisticated table format. Works with scanned PDFs. Multiple input and output formats.
- Cons: Splits multi-line data from one row into numerous rows; not as resilient on complex scanned PDFs as some other solutions; unclear script/automation option.
- Conclusion: Particularly intriguing if you're interested in exports to more than just Excel and numerous formats.
Many people are unaware that Excel has a direct PDF import feature, but it is only available on Windows computers with a Microsoft 365 or Office 365 subscription. It was a good option for the straightforward file, but as PDF complexity increased, it became more difficult to utilize. People who are unfamiliar with Excel's Power Query / Get & Transform interface may also find it bewildering.
How to import an Excel spreadsheet from a PDF: Go to Data > Get Data > From File > From PDF in the Ribbon toolbar and choose your file. You'll probably just have one option to import a single table. When you select it, a preview of the table and a choice to load it or alter the data before loading should appear. The table will appear on your Excel sheet after you click Load.
This is a quick and comparatively straightforward option for a single table on a single page. This also works well if you have numerous tables in a multi-page PDF as long as each table is contained on a single page. However, things become a little more complicated and you'll need to be familiar with Power Query methods if you have one table spread across numerous PDF pages.
Comparing Power Query data transformation to the alternatives is a little unfair because the output of any of these other PDF to Excel converters could be imported into Excel for Power Query manipulation.
- Score: Good.
- Cost: Free with a Windows subscription to Microsoft 365 and Office 365.
- Pro: If you are familiar with Power Query, you can do a lot of built-in data wrangling without leaving Excel.
- Cons: Requires a Microsoft 365/Office 365 subscription on Windows; difficult to use on any but the simplest PDFs; doesn't work with scanned PDFs.
- In conclusion, Excel is worth a try if you already have Microsoft 365/Office 365 on Windows and you have a straightforward conversion assignment. If you are familiar with Power Query, you should give this a try for more PDF conversions. (If you don't, learning Power Query is an excellent skill to pick up for all Excel users.) But you're probably better off with another choice if your PDF is more difficult and you don't already utilize Power Query / Get & Transform.