Here is a Windows command script using gsscript that reports the PDF file page number off * string $key - key name, case insensitive use strtolower to make case insensitive loop each line and split into key and value * this will put all pdfinfo output into keyed array, then make them accessible via getValueĮxec(self::PDFINFO_CMD. * Wrapper for pdfinfo program, part of xpdf bundle I created a wrapper class for pdfinfo in case it's useful to anyone, based on Richard's /** Security Notice: Use escapeshellarg on $document if document name is being fed from user input or file uploads. That's why I made this question and answered it myself. I hope this can help people, because I have spent a whole lot of time trying to find the solution to this and I have seen a lot of questions about PDF pagecount in which I didn't find the answer I was looking for. I know its not pure PHP, but external programs are way better in PDF handling (as seen in the question). ![]() Of course this command line tool can be used in other languages that can parse output from an external program, but I use it in PHP. Surround with double quotes if file name has spaces $cmd = "C:\\path\\to\\pdfinfo.exe" // Windows There is an easy way of extracting the pagecount from the output, here in PHP: // Make a function for convenience It is also really fast, even with big documents of 200 MB the response time is a just a few seconds or less. ![]() I haven't seen a PDF document where it returned a false pagecount (yet). Producer: Acrobat Distiller 9.2.0 (Windows) An example of data returned by running it on a PDF document: Title: test1.pdf One of those files is pdfinfo (or pdfinfo.exe for Windows). You download a compressed file containing several little PDF-related programs. It is downloadable for Linux and Windows. So, what does work reliable and accurate?Ī simple command line executable called: pdfinfo. /\/N\s (\d )/ (looks for /N ) doesn't work either, as the documents can contain multiple values of /N most, if not all, not containing the pagecount./\/Page\W*(\d )/ (looks for /Page) doesn't get the number of pages, mostly contains some other data./\/Count\s (\d )/ (looks for /Count ) doesn't work because only a few documents have the parameter /Count inside, so most of the time it doesn't return anything.If(preg_match_all($regex, $content, $matches)) Regular Expressions found by Googling (all linked to SO answers): $content = fread ($stream, filesize($f)) This opens the PDF file in a stream and searches for some kind of string, containing the pagecount or something similar. Opening a stream and search with a regular expression: ![]() It then returns an error:įPDF error: This document (test_1.pdf) probably uses a compression technique which is not supported by the free parser shipped with FPDI. Using FPDI (a PHP library)įPDI is easy to use and install (just extract files and call a PHP script), BUT many of the compression techniques are not supported by FPDI. That was with both the getNumberImages() and identifyImage() methods. Imagick requires a lot of installation, apache needs to restart, and when I finally had it working, it took amazingly long to process (2-3 minutes per document) and it always returned 1 page in every document (haven't seen a working copy of Imagick so far), so I threw it away. Here are some of the answers I found insufficient or simply NOT working: Using Imagick (a PHP extension) PDF documents come from many different clients, so they aren't generated with the same application and/or don't use the same compression method. Since I work for a graphic printing and reproduction company that works a lot with PDFs, the number of pages in a document must be precisely known before they are processed. Many hours have I searched for a fast and easy, but mostly accurate, way to get the number of pages in a PDF document. The solution is the accepted answer below. This question is for referencing and comparing.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |