post Category: General Musings, The Office — Jon Watson @ 6:00 pm — post Comments (0)
vector version of this image
Image via Wikipedia

In keeping with my long tradition of blogging about things that only interest me and not caring a whit about what might interest you, this entry is dedicated to an area of deep confusion on the Interwebs:

Why can’t I send a PDF form via email with the free Adobe Reader?

First off, I would like to congratulate you on at least arriving at this correct conclusion. The amount of misinformation on the Internet about how to send data from a PDF form being rendered in Adobe Reader is truly titanic and it is hard to figure out the truth of the matter which is: you can’t.

The Adobe Reader is free. Adobe made a pretty good attempt many years ago to make the PDF format the defacto standard for passing documents around. Adobe PDFs render the same on every system thus preserving formatting and they are unalterable which makes them better than Word documents or other editable documents for non-repudiation.

However, let us never forget that this application is called Adobe Reader meaning that it can only read PDF documents. If the Adobe Reader could actually alter the PDF document then the argument for purchasing the very expensive Adobe Acrobat would be a hard one to make. In order to keep hoards of Adobe programmers in their iPhones and granola bars, Adobe Reader cannot alter PDF documents.

That makes sense and we can all nod sagely at that as a good business decision. There’s a bit more to this which isn’t so obvious and is the source of the aggravating ‘PDF via email’ failure issue. While the Adobe Reader can allow the user to fill out a PDF form, it cannot save that form data. This is an extension of the ‘cannot alter’ caveat because if users were able to save PDF forms and later re-open them with their saved data, that is, in effect, altering the PDF which is not within Reader’s purview.

The fly in the ointment is that in order to allow users to send a filled in PDF form via email, the form muse be sent as an attachment and before a file can be attached to an email it has to…well…exist. Since we cannot save PDF forms, the file does not exist and it therefore cannot be attached to an email. Follow that?

So how do we get around this?

We don’t. We suck it up.

The (marginally) good news is that just because Adobe Reader cannot send form data as a PDF does not mean that it cannot send form data at all. In fact, Adobe Reader is perfectly able to send the data from any given PDF form as XML via email. The caveat to that is that whomever is receiving said data must have some method of dealing with this XML data. The easiest way, of course, is to have the aforementioned very expensive Adobe Acrobat installed which will happily suck this XML data into the original form for rendering or shoot it into an Xcel spreadsheet. However, any reasonably competent programmer can parse incoming XML into an almost limitless array of possibilities.

The only other way to get around this is the probably rare situation where the person filling out the form has Adobe Acrobat. In that case, since Adobe Acrobat can most certainly save PDF files, there is no problem attaching a proper PDF to an email and sending it.

Lastly, if you really, really, really need users with only Adobe Reader to be able to return PDFs, then you can buy the insanely expensive and frankensteinien Adobe LiveCycle Reader Extensions Server. Yep, this bad boy will let people using Adobe Reader return PDFs by virtue of sending them to your Adobe LiveCycle Reader Extensions Server for processing. And yes, you have to maintain your own hardware to run this bad boy so it’s really not something any human would likely look at.

The most oft-repeated piece of misinformation I read on the web while researching this are these two urban legends:

  1. Create a regular button instead of a “Submit Button”. On that button change the type dropdown to “Submit”, then click on the “Submit” tab and choose “PDF” as the filetype to send.
  2. Edit the XML for the Email Button and change the event format attribute from ‘xml’ tp ‘pdf’.

Neither of these will work if your user who is filling out the form is using Adobe Reader. The limitation is in Adobe Reader and you can’t do a dang thing about it by messing around with your form at creation time. Seriously, try it. Go nuts. I did.

Reblog this post [with Zemanta]

Tags: , , , , , , ,

Rate this post:

Some related posts:

`
post Category: General Musings — Jon Watson @ 8:44 am — post Comments (0)

I’ve spent the last two weeks waist-deep in US medical provider data in an attempt to match federal-level provider identification numbers with state-level licensees. The two levels of government make little attempt to correlate their two data sets so matching them together was quite the task, but we did it. In broad terms, here’s how:

(more…)

Tags: ,

Rate this post:

Some related posts:

`
post Category: General Musings, Security — Jon Watson @ 11:04 am — post Comments (0)
committed anus

Image by sermoa via Flickr

I have employed reCAPTCHA on many sites that I have worked on over time. For the unitiated, a Captcha is a graphic representation of a word or words that are distorted to such an extent that it is unlikely a spam bot would be able to determine the original word. The words are clear enough to humans, however, so requiring a web user to solve a captcha before doing something like posting a comment on a blog or creating a new forum post is a fairly reliable way to keep spam bots out of your site.

reCAPTCHA is the name of a specific captcha solution  which has found a clever way to use the collective brainpower of the human captcha solvers world-wide to assist in digitizing books. Simply put, when you employ reCAPTCHA on your site, the words that your users are asked to solve are words that OCR technology used in digitizing books has failed to recognize. Your human user tells reCAPTCHA what the word is by virtue of solving the captcha and then that knowledge goes into the OCR engine to help it correctly digitize subsequent occurrence of that word.

Brilliant! I thought. Until I actually did think. Eventually, I stumbled across the logic flaw of the process. Specifically, if the reCAPTCHA doesn’t know what the word is, how does it know of the user has solved it properly? Turns out there is a very clever method to that as well. From the reCAPTCHA site:

… if a computer can’t read such a CAPTCHA, how does the system know the correct answer to the puzzle? Here’s how: Each new word that cannot be read correctly by OCR is given to a user in conjunction with another word for which the answer is already known. The user is then asked to read both words. If they solve the one for which the answer is known, the system assumes their answer is correct for the new one. The system then gives the new image to a number of other people to determine, with higher confidence, whether the original answer was correct.

I’m a big fan of the second part – the confidence part. I am currently involved in a data matching project where we are attempting to reliably match a massive chunk of state medical data with another, massiver chunk of federal medical data. The two files are completely oblivious to each other and do not have keys to each other’s data. We therefore have to use the data itself to attempt to find matches but much of the data is in inconsistent formats between files. Thus, we need to deliberately generate broad matches initially, and then iteratively narrow it down. We are doing this by scoring the matches so we can (hopefully) just look at the highest rated match for any given record and that will be the correct one. In reCAPTCHA’s case they are doing roughly the same thing by querying multiple users and then taking the highest rated possible answer and assuming it to be correct. Clever.

Reblog this post [with Zemanta]

Tags: , ,

Rate this post:

Some related posts:

`