{"id":635,"date":"2017-10-09T15:14:29","date_gmt":"2017-10-09T14:14:29","guid":{"rendered":"http:\/\/hamsterhirn.de\/?p=635"},"modified":"2017-10-09T15:14:29","modified_gmt":"2017-10-09T14:14:29","slug":"awk-script-to-remove-objects-from-a-pdf","status":"publish","type":"post","link":"https:\/\/hamsterhirn.de\/index.php\/2017\/10\/awk-script-to-remove-objects-from-a-pdf\/","title":{"rendered":"awk script to remove objects from a pdf"},"content":{"rendered":"<p>First uncompress the pdf if it is compressed with pdftk:<br \/>\n<code class=\"preserve-code-formatting\">pdftk myfile.pdf output unc.pdf uncompress<\/code><\/p>\n<p>Then remove all objects that contain the keywords PDF-XChange|pdfxviewer.com|PDFXCViewer20|Click to buy NOW:<br \/>\n<pre><code class=\"preserve-code-formatting\">awk &#039;\n&nbsp;&nbsp;&nbsp;&nbsp;BEGIN {\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;found=0\n&nbsp;&nbsp;&nbsp;&nbsp;}\n&nbsp;&nbsp;&nbsp;&nbsp;{\n&nbsp;&nbsp;&nbsp;&nbsp;if ( $0 ~ \/^[0-9 ]+obj\/ ) {\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;objectFound=1;\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;objectLineCounter=0;\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;objektZeilen[objectLineCounter]=$0;\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;objectLineCounter++;\n&nbsp;&nbsp;&nbsp;&nbsp;} else if (objectFound == 1) {\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;objektZeilen[objectLineCounter]=$0;\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;if ( $0 ~ \/PDF-XChange|pdfxviewer.com|PDFXCViewer20|Click to buy NOW\/ ) {\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;found=1;\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;}\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;if ( ( $0 ~ \/endobj\/ ) &amp;&amp; ( found == 0 ) ) {\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;for (i=0; i&lt;length(objektZeilen); i++) {\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;print objektZeilen[i];\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;}\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;delete objektZeilen;\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;objectFound=0;\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;found=0;\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;}\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;if ( ( $0 ~ \/endobj\/ ) &amp;&amp; ( found == 1 ) ) {\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;delete objektZeilen;\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;objectFound=0;\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;found=0;\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;}\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;objectLineCounter++;\n&nbsp;&nbsp;&nbsp;&nbsp;} else {\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;print $0\n&nbsp;&nbsp;&nbsp;&nbsp;}\n&nbsp;&nbsp;&nbsp;&nbsp;}\n&#039; unc.pdf &gt; test.pdf<\/code><\/pre><\/p>\n<p>Recompress and repair pdf with pdftk:<br \/>\n<code class=\"preserve-code-formatting\">pdftk test.pdf output comp.pdf compress<\/code><\/p>\n<p>Too bad it also removed the OCR layer. I couldn&#8217;t find out which layer is responsible for the OCR.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>First uncompress the pdf if it is compressed with pdftk: pdftk myfile.pdf output unc.pdf uncompress Then remove all objects that contain the keywords PDF-XChange|pdfxviewer.com|PDFXCViewer20|Click to buy NOW: awk &#039; &nbsp;&nbsp;&nbsp;&nbsp;BEGIN { &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;found=0 &nbsp;&nbsp;&nbsp;&nbsp;} &nbsp;&nbsp;&nbsp;&nbsp;{ &nbsp;&nbsp;&nbsp;&nbsp;if ( $0 ~ \/^[0-9 ]+obj\/ ) { &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;objectFound=1; &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;objectLineCounter=0; &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;objektZeilen[objectLineCounter]=$0; &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;objectLineCounter++; &nbsp;&nbsp;&nbsp;&nbsp;} else if (objectFound == 1) { &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;objektZeilen[objectLineCounter]=$0; &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;if [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[3],"tags":[316,66,317,147],"class_list":["post-635","post","type-post","status-publish","format-standard","hentry","category-it","tag-awk","tag-pdftk","tag-remove-objects","tag-shell"],"_links":{"self":[{"href":"https:\/\/hamsterhirn.de\/index.php\/wp-json\/wp\/v2\/posts\/635","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/hamsterhirn.de\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/hamsterhirn.de\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/hamsterhirn.de\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/hamsterhirn.de\/index.php\/wp-json\/wp\/v2\/comments?post=635"}],"version-history":[{"count":1,"href":"https:\/\/hamsterhirn.de\/index.php\/wp-json\/wp\/v2\/posts\/635\/revisions"}],"predecessor-version":[{"id":636,"href":"https:\/\/hamsterhirn.de\/index.php\/wp-json\/wp\/v2\/posts\/635\/revisions\/636"}],"wp:attachment":[{"href":"https:\/\/hamsterhirn.de\/index.php\/wp-json\/wp\/v2\/media?parent=635"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/hamsterhirn.de\/index.php\/wp-json\/wp\/v2\/categories?post=635"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/hamsterhirn.de\/index.php\/wp-json\/wp\/v2\/tags?post=635"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}