Tuesday, 16 September 2014


PDF is a file format developed by Adobe Systems in the early 1990s as a way to share documents, text, images. This file format is used for presenting documents which are independent of application, operating system, hardware, and software. It is an open standard for electronic document exchange maintained by the international organization for standardization (ISO). When documents, forms, graphics, web pages are converted into PDF format, and they appear as printed text. For reading PDF files you need to install the free Adobe Reader Software. Once you have downloaded the Reader, it will automatically start up whenever you want to look at a PDF file. These files are especially useful for documents such as magazine articles, product brochures, or flyers in which you want to preserve the original graphic appearance online.



The PDF version 1.7 includes all the functionality of its previous versions from 1.0 to 1.6 and some of the features are removed by Adobe, which are not according to the ISO 3200-1 specifications.


PDF combines three technologies together; the specification of these technologies is as follows:
·     From Postscript page description programming language, a subset is included for generating the layout and graphics.
·     Fonts are allowed to travel with the documents by Font embedding/replacement  system.
·     Provides a Structure System for storage to bind elements and any associated content into a single file, with a data compression facility, wherever appropriate.
NOTE: Postscript is a page description language that runs in an interpreter to generate an image. It can handle standard features of programming as well as graphics and other commands like if condition and loops.
File Structure
The file Structure of PDF determines how objects are stored in a PDF file, how they are accessed, and updated. The Structure is independent of Semantics of the objects. The PDF file contains text with some binary data mixed in it. If you open a PDF file using a text editor, you‘ll see the raw objects that define the Structure and content of the documents.  The PDF documents contain eight basic types of objects:
  1.          Boolean value (TRUE or FALSE)
  2.          Numbers
  3.          Strings
  4.          Arrays (ordered objects)
  5.          Names
  6.          Dictionaries (objects indexed by names)
  7.          Streams (for large amount of data)
  8.          The Null value objects. 

The objects included PDF files can be direct or indirect. Indirect objects are numbered with an object number and a generation number, e.g.  “12 0 obj” then, 12 is the object reference number. Direct objects are the objects embedded in other objects like; “12 0 R” to show the inclusion of previous object.


 A PDF file basically contains two types of Metadata.

·     Document Information Dictionary: It is asset of key value fields such as author, title, subjects, creations, and update dates. This is stored in the optional Info Trailer of the file and a small set of fields is defined that can be extended with the additional text values, if required.
·     Another, Extensible Metadata Platform (XMP) to add XML standards based extensible metadata as in other file formats. This allows metadata to attach illustrations, as well as the whole documents.


PDF files are encrypted for security purposes and digitally signed for providing authentication to the message. Adobe has defined certain standards for providing security to the PDF files. There are basically two different methods and two different passwords that can be used in a PDF. A User Password encrypts the file and prevents opening it. The other one is Owner Password, which specifies the operation that should be restricted even when the document is decrypted. Operations include: printing, copying text and graphics out of the documents, adding or modifying the documents. The User Password encrypts the file and requires password cracking schemes to defeat its security measures. The difficulty level of cracking depends on the strength of password and encryption method used. The owner password does not encrypt the files, instead relies on the client software to respect these restrictions and is not fully secure. A number of third party tools are available for cracking the password of PDF files and also online free services are available for the same.


For usage rights, signatures are used that enables additional interactive features. These features are not available by default in a particular PDF viewer application.
The signature is used to validate that the permits have been granted by a authentic granting authority. It allows user to:
  •          To save the PDF documents along the modified form.
  •          Import data files in FDF, XFDF and text format.
  •          Export data files in FDF and XFDF formats.
  •          Submit from data.
  •          Instantiate new pages from named page templates
  •          Apply digital signatures 
  •          Create, modify, delete, copy, import, & export annotations


There are some technical issues related to the PDF files, some of them are discussed below:
  1.  Scanned Documents: PDF files created by scanning hard copy documents containing primarily text, do not have the same structure as a PDF file of the same document. The scanned copy document internally contains the picture of the document, with no information about the text. A good quality scanning often makes the document look like the native PDF file, but a poor quality scanning results in poor structure.  
  2. Accessibility: PDF files can be created especially for disabled people. PDF files in 2014 can include XML tags, text equivalents captions, audio description, etc. The file can be magnified for the reader with visual impairments.
  3. Virus and Exploits: PDF file attachments carry viruses and it was first discovered in 2001.The virus named outlook.pdf worm, used Microsoft Outlook to send itself as an attachments to the PDF file. One way of avoiding PDF file exploits is to have a local or web service converted to another format before viewing.
  4. Usage Restriction and Monitoring: PDF files are encrypted and a password is needed to view or edit the content. The PDF references define both 40-bit and 128 bit encryption. Adobe provides a method to set security policies on specific documents.


Redaction is a form of editing in which multiple source text are combined (redacted) and altered to make a single document. Redacting a PDF file allows to keep your document’s formatting, while hiding sensitive information. It can and should be, used to cover information such as; Social Security Numbers, competitive information and even images.
PDF is preferred over most of the other file formats for documenting and communicating, because it exhibits the following benefits:
  1. .       Reliability
  2. .       Open standard
  3. .       Trustworthy
  4. .       Supports Multiplatform
  5. .       Rich in file Integrity
  6. .       Easily Accessible

No comments:

Post a Comment