<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>geedew &#187; PDF</title>
	<atom:link href="http://www.geedew.com/tag/pdf/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.geedew.com</link>
	<description>flirting with the accessible web</description>
	<lastBuildDate>Wed, 13 Apr 2011 22:46:59 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>How To: Converting PDF to Word and HTML</title>
		<link>http://www.geedew.com/2007/12/23/how-to-converting-pdf-to-word-and-html/</link>
		<comments>http://www.geedew.com/2007/12/23/how-to-converting-pdf-to-word-and-html/#comments</comments>
		<pubDate>Sun, 23 Dec 2007 08:36:38 +0000</pubDate>
		<dc:creator>drew</dc:creator>
				<category><![CDATA[Computers]]></category>
		<category><![CDATA[How-To]]></category>
		<category><![CDATA[Misc]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[convert]]></category>
		<category><![CDATA[HTML]]></category>
		<category><![CDATA[JavaScript]]></category>
		<category><![CDATA[Microsoft Office]]></category>
		<category><![CDATA[OpenOffice]]></category>
		<category><![CDATA[PDF]]></category>

		<guid isPermaLink="false">http://geedew.com/wp-content/uploads/2007/12/23/how-to-converting-pdf-to-word-and-html/</guid>
		<description><![CDATA[Sites need to be able to interact in one single, universal space. -Tim Berners-Lee I started this little project because I have a client whom needs to get his 24 page PDF online. The problem is that a 24 page (&#8230;)</p><p><a href="http://www.geedew.com/2007/12/23/how-to-converting-pdf-to-word-and-html/">Read the rest of this entry &#187;</a></p>]]></description>
			<content:encoded><![CDATA[<p align="center"><span style="size: 12px; font-weight: 400; color: #000000; font-family: calibri,veranda">Sites need to be able to interact in one single, universal space.</span></p>
<p align="right"><span style="size: 14px; font-weight: bold; color: #000000; font-family: calibri,veranda">-Tim Berners-Lee</span></p>
<p align="left"><span style="size: 12px; color: #000000; font-family: calibri,veranda">I started this little project because I have a client whom needs to get his 24 page PDF online.  The problem is that a 24 page PDF with all the bells and whistles</span></p>
<p align="left"><span style="size: 12px; color: #000000; font-family: calibri,veranda">ends up being over 5mb in size.  This causes issues for people running sub-cable internet connections, as the loading time becomes horrendous.  So to solve the problem, I am going to run the PDF as a download by choice and have all the links point to the HTML(Hyper-Text Markup Language: what webpages are written in) converted page when they click on what page they want to see.  This does however cause problems if something is updated on the PDF, the HTML is not dynamic or binded to the PDF so and update will p align=&#8221;left&#8221;&gt;     <span style="size: 12px; color: #000000; font-family: calibri,veranda">have to occur in both places.  The only way around that is to have the HTML being the origionating source and have the &#8216;download as pdf&#8217; link be  a call to a server side script that packages the HTML as a PDF. That however is too much for what this client needs and the issues with the updating will have to be taken in stride. </span></span></p>
<p align="left"><span style="size: 12px; color: #000000; font-family: calibri,veranda"> </span></p>
<blockquote>
<p align="left"><span style="size: 12px; color: #000000; font-family: calibri,veranda"><strong>Tools Needed</strong>:<br />
RTF or DOC reader (I prefer OpenOffice2.2) that can convert to HTML<br />
A Program designed to convert PDF to DOC format (I used Able2dDoc, licensed)</span></p>
</blockquote>
<p align="left"><span style="size: 12px; color: #000000; font-family: calibri,veranda">Unfortunately, In my case, the PDF contained a large amount of tables that were made up by images after conversion. Because of this, I had to handle things a little bit different, in which I will explain later.</span></p>
<p align="left"><span style="size: 12px; color: #000000; font-family: calibri,veranda"><br />
<strong>First things first, lets convert to HTML</strong></span></p>
<p align="left"><span style="size: 12px; color: #000000; font-family: calibri,veranda"> Using the software I used, Able2Doc, if you load up the PDF you can simply convert the file to a DOC format.  Notice, not many converters will go straight from PDF to DOC or RTF formats. Once you are able to convert the PDF to DOC or RTF, you can then open up that file into Microsoft Office or Open Office.  Both have the ability to Open up these files and then Export them as HTML.</span></p>
<p align="left"><span style="size: 12px; color: #000000; font-family: calibri,veranda"><strong>Microsoft Offices&#8217; way of doing things</strong></span></p>
<p style="text-align: left;"><span style="size: 12px; color: #000000; font-family: calibri,veranda">Office is really simple. Take the document you are in and go to File-&gt;Save As-&gt;Other</span></p>
<p>
<span style="size: 12px; color: #000000; font-family: calibri,veranda"> </span><a title="PDF to HTML Save as" href="http://geedew.com/blog/wp-content/uploads/2007/12/pdftohtml-word-saveas.png"><img style="border: 1px solid black;" src="http://geedew.com/blog/wp-content/uploads/2007/12/pdftohtml-word-saveas.png" alt="PDF to HTML Save as" width="500" height="300" /></a><br />
</p>
<p align="left"><span style="size: 12px; color: #000000; font-family: calibri,veranda"> Then after that you can go click and change the type to an HTML Document&#8230; put in the name and your done!</span></p>
<p></p>
<p align="left"><a title="PDF to HTML Save as HTML" href="http://geedew.com/blog/wp-content/uploads/2007/12/pdftohtml-word-saveas-html.png"><img style="border: 1px solid black;" src="http://geedew.com/blog/wp-content/uploads/2007/12/pdftohtml-word-saveas-html.png" alt="PDF to HTML Save as HTML" width="500" height="300" /></a></p>
<p></p>
<p align="left"><span style="size: 12px; color: #000000; font-family: calibri,veranda"><strong>Open Offices&#8217; way of doing things<br />
</strong>In Open Office, it is actually easier! Just have your document open and then go to File -&gt;Save As and you can then select the HTML from the drop down list.  No extra step as there is in Word.</span></p>
<p align="left"> </p>
<p align="left"> </p>
<p align="left"><a title="PDF to HTML OO" href="http://geedew.com/blog/wp-content/uploads/2007/12/pdftohtml-oo2-saveas-html.png"><img style="border: 1px solid black;" src="http://geedew.com/blog/wp-content/uploads/2007/12/pdftohtml-oo2-saveas-html.png" alt="PDF to HTML OO" width="500" height="300" /></a></p>
<p></p>
<p align="left"><span style="size: 12px; color: #000000; font-family: calibri,veranda"><br />
<strong>When things get messy&#8230;<br />
</strong>You have to start to get creative. I know, it stinks, when things just don&#8217;t go your way.  I mentioned earlier that I my specific issue just could not be settled by this process only because the images in the PDF were making up the tables and the text did not stick inside the image/tables when changed to HTML.  I ended having to go with a slightly altered reality, but the end result to the user is near the same.</span></p>
<p align="left"><span style="size: 12px; color: #000000; font-family: calibri,veranda">The idea that I had was to split the PDF into images. This was actually really easy to do.  I swapped over to linux for this part (Ubuntu Gutsy).  The PDF Reader program has the ability to output to JPG for your PDF&#8217;s.  This came in very handy, I simply outputted the PDF I was using, 28 pages of it, as JPG&#8217;s and then used to Javascript to make a nice little  setup for Checking out the picutres.</span></p>
<blockquote>
<h5><span style="font-family: calibri; ">Javascript Code In the &lt;body&gt; tags :</span><br />
<span style="color: #888888;"><span style="font-weight: normal;">&lt;script&gt; function PageQuery(q) {if(q.length &gt; 1) this.q = q.substring(1, q.length);else this.q = null;<br />
this.keyValuePairs = new Array();<br />
if(q) {<br />
for(var i=0; i &lt; this.q.split(&#8220;&amp;&#8221;).length; i++) {<br />
this.keyValuePairs[i] = this.q.split(&#8220;&amp;&#8221;)[i];<br />
}<br />
}<br />
this.getKeyValuePairs = function() { return this.keyValuePairs; }<br />
this.getValue = function(s) {<br />
for(var j=0; j &lt; this.keyValuePairs.length; j++) {<br />
if(this.keyValuePairs[j].split(&#8220;=&#8221;)[0] == s)<br />
return this.keyValuePairs[j].split(&#8220;=&#8221;)[1];<br />
}<br />
return false;<br />
}<br />
this.getParameters = function() {<br />
var a = new Array(this.getLength());<br />
for(var j=0; j &lt; this.keyValuePairs.length; j++) {<br />
a[j] = this.keyValuePairs[j].split(&#8220;=&#8221;)[0];<br />
}<br />
return a;<br />
}<br />
this.getLength = function() { return this.keyValuePairs.length; }<br />
}<br />
function queryString(key){<br />
var page = new PageQuery(window.location.search);<br />
return unescape(page.getValue(key));<br />
}<br />
function displayItem(key){<br />
if(queryString(key)==&#8217;false&#8217;)<br />
{<br />
return &#8217;1&#8242;;<br />
}else{<br />
return queryString(key);<br />
}<br />
}<br />
&lt;/script&gt;</span></span></h5>
</blockquote>
<p><script type="text/javascript"><!--
 function PageQuery(q) {if(q.length > 1) this.q = q.substring(1, q.length);else this.q = null;
this.keyValuePairs = new Array();
if(q) {
for(var i=0; i < this.q.split("&#038;").length; i++) {
this.keyValuePairs[i] = this.q.split("&#038;")[i];
}
}
this.getKeyValuePairs = function() { return this.keyValuePairs; }
this.getValue = function(s) {
for(var j=0; j < this.keyValuePairs.length; j++) {
if(this.keyValuePairs[j].split("=")[0] == s)
return this.keyValuePairs[j].split("=")[1];
}
return false;
}
this.getParameters = function() {
var a = new Array(this.getLength());
for(var j=0; j < this.keyValuePairs.length; j++) {
a[j] = this.keyValuePairs[j].split("=")[0];
}
return a;
}
this.getLength = function() { return this.keyValuePairs.length; }
}
function queryString(key){
var page = new PageQuery(window.location.search);
return unescape(page.getValue(key));
}
function displayItem(key){
if(queryString(key)=='false')
{
return '1';
}else{
return queryString(key);
}
}
// --></script></p>
<p><span style="size: 12px; color: #000000; font-family: calibri,veranda"><strong>What else?<br />
</strong></span></p>
<p><span style="size: 12px; color: #000000; font-family: calibri,veranda"><strong> </strong>Once this code is in place, you can see what it is trying to do.  You are basically parsing a query address URL and looking for the specific information showing on whatever variable you pass in.  This is giving your JavaScript the ability to know what a variable is from a JavaScript /PHP equivalent to the Get variables.</span><span style="size: 12px; color: #000000; font-family: calibri,veranda"><strong><br />
</strong></span></p>
<p><span style="size: 12px; color: #000000; font-family: calibri,veranda"><strong> </strong>Now you need the code that will change values of a select box, so the complete picture will come into view.</span></p>
<p><!--</p-->
<p><span style="size: 12px; color: #000000; font-family: calibri,veranda">The idea that I had was to split the PDF into images. This was actually really easy to do. I swapped over to Linux for this part (Ubuntu Gutsy). The PDF Reader program has the ability to output to JPG for your PDF&#8217;s. This came in very handy, I simply outputted the PDF I was using, 28 pages of it, as JPG&#8217;s and then used to JavaScript to make a nice little setup for Checking out the pictures.</span></p>
<blockquote>
<h5><span style="font-weight: normal;"><span style="color: #888888;">&lt;</span></span><span style="font-weight: normal;"><span style="color: #888888;">SCRIPT LANGUAGE=&#8221;JavaScript&#8221;&gt;<br />
</span></span></h5>
<h5><span style="font-weight: normal;"><span style="color: #888888;"> function loadPage(value) {<br />
</span></span></h5>
<h5><span style="font-weight: normal;"><span style="color: #888888;">if(value == &#8220;&#8221;) {<br />
</span></span></h5>
<h5><span style="font-weight: normal;"><span style="color: #888888;"> document.getElementById(&#8216;mainimage&#8217;).src=&#8221;img/ProductCatalog/Page1.jpg&#8221;;<br />
</span></span></h5>
<h5><span style="font-weight: normal;"><span style="color: #888888;"> } else {<br />
</span></span></h5>
<h5><span style="font-weight: normal;"><span style="color: #888888;"> document.getElementById(&#8216;mainimage&#8217;).src=&#8221;img/ProductCatalog/Page&#8221; + displayItem(&#8216;p&#8217;) +&#8221;.jpg&#8221;;<br />
</span></span></h5>
<h5><span style="font-weight: normal;"><span style="color: #888888;"> }<br />
</span></span></h5>
<h5><span style="font-weight: normal;"><span style="color: #888888;"> }<br />
</span></span></h5>
<h5><span style="font-weight: normal;"><span style="color: #888888;"> function changeImage()<br />
</span></span></h5>
<h5><span style="font-weight: normal;"><span style="color: #888888;">{<br />
</span></span></h5>
<h5><span style="font-weight: normal;"><span style="color: #888888;"> document.getElementById(&#8216;mainimage&#8217;).src = document.getElementById(&#8216;list&#8217;).options[document.getElementById('list').selectedIndex].value;<br />
</span></span></h5>
<h5><span style="font-weight: normal;"><span style="color: #888888;"> }<br />
</span></span></h5>
<h5><span style="font-weight: normal;"><span style="color: #888888;">function prevImage()<br />
</span></span></h5>
<h5><span style="font-weight: normal;"><span style="color: #888888;"> {<br />
</span></span></h5>
<h5><span style="font-weight: normal;"><span style="color: #888888;"> if(document.getElementById(&#8216;list&#8217;).selectedIndex == 0)<br />
</span></span></h5>
<h5><span style="font-weight: normal;"><span style="color: #888888;"> {<br />
</span></span></h5>
<h5><span style="font-weight: normal;"><span style="color: #888888;"> document.getElementById(&#8216;list&#8217;).selectedIndex = document.getElementById(&#8216;list&#8217;).options.length-1;<br />
</span></span></h5>
<h5><span style="font-weight: normal;"><span style="color: #888888;"> }<br />
</span></span></h5>
<h5><span style="font-weight: normal;"><span style="color: #888888;"> else<br />
</span></span></h5>
<h5><span style="font-weight: normal;"><span style="color: #888888;"> {<br />
</span></span></h5>
<h5><span style="font-weight: normal;"><span style="color: #888888;"> document.getElementById(&#8216;list&#8217;).selectedIndex&#8211;;<br />
</span></span></h5>
<h5><span style="font-weight: normal;"><span style="color: #888888;"> }<br />
</span></span></h5>
<h5><span style="font-weight: normal;"><span style="color: #888888;"> changeImage();<br />
</span></span></h5>
<h5><span style="font-weight: normal;"><span style="color: #888888;"> }<br />
</span></span></h5>
<h5><span style="font-weight: normal;"><span style="color: #888888;">function nextImage()<br />
</span></span></h5>
<h5><span style="font-weight: normal;"><span style="color: #888888;"> {<br />
</span></span></h5>
<h5><span style="font-weight: normal;"><span style="color: #888888;"> if(document.getElementById(&#8216;list&#8217;).selectedIndex == document.getElementById(&#8216;list&#8217;).options.length-1)<br />
</span></span></h5>
<h5><span style="font-weight: normal;"><span style="color: #888888;"> {<br />
</span></span></h5>
<h5><span style="font-weight: normal;"><span style="color: #888888;"> document.getElementById(&#8216;list&#8217;).selectedIndex = 0;<br />
</span></span></h5>
<h5><span style="font-weight: normal;"><span style="color: #888888;"> }<br />
</span></span></h5>
<h5><span style="font-weight: normal;"><span style="color: #888888;"> else<br />
</span></span></h5>
<h5><span style="font-weight: normal;"><span style="color: #888888;"> {<br />
</span></span></h5>
<h5><span style="font-weight: normal;"><span style="color: #888888;"> document.getElementById(&#8216;list&#8217;).selectedIndex++;<br />
</span></span></h5>
<h5><span style="font-weight: normal;"><span style="color: #888888;"> }<br />
</span></span></h5>
<h5><span style="font-weight: normal;"><span style="color: #888888;"> changeImage();<br />
</span></span></h5>
<h5><span style="font-weight: normal;"><span style="color: #888888;">}<br />
</span></span></h5>
<h5><span style="font-weight: normal;"><span style="color: #888888;">&lt;/script&gt;</span></span></h5>
</blockquote>
<p><span style="font-family: calibri;">That was pretty much what the JavaScript needed, and then I could handle everything else from HTML which made things much easier.  All that is said and done, any Q&#8217;s just ask.</span></p>
]]></content:encoded>
			<wfw:commentRss>http://www.geedew.com/2007/12/23/how-to-converting-pdf-to-word-and-html/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

