Package org.apache.pdfbox.pdfparser
Class ConformingPDFParser
- java.lang.Object
-
- org.apache.pdfbox.pdfparser.BaseParser
-
- org.apache.pdfbox.pdfparser.ConformingPDFParser
-
public class ConformingPDFParser extends BaseParser
- Author:
- Adam Nichols
-
-
Field Summary
Fields Modifier and Type Field Description protected RandomAccess
inputFile
-
Fields inherited from class org.apache.pdfbox.pdfparser.BaseParser
DEF, document, ENDOBJ, ENDSTREAM, forceParsing, pdfSource, PROP_PUSHBACK_SIZE
-
-
Constructor Summary
Constructors Constructor Description ConformingPDFParser(java.io.File inputFile)
Constructor.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description protected byte
consumeWhitespace()
This will read all bytes until a non-whitespace character is found.protected byte
consumeWhitespaceBackwards()
This will read all bytes (backwards) until a non-whitespace character is found.COSDocument
getDocument()
This will get the document that was parsed.COSBase
getObject(long objectNumber, long generation)
PDDocument
getPDDocument()
This will get the PD document that was parsed.boolean
isRecursivlyRead()
void
parse()
This will parse the stream and populate the COSDocument object.protected COSNumber
parseNumber(java.lang.String number)
protected long
parseTrailerInformation()
protected COSBase
processCosObject(java.lang.String string)
protected java.lang.String
readBackwardUntilWhitespace()
protected byte
readByte()
protected byte
readByteBackwards()
protected COSDictionary
readDictionaryBackwards()
protected int
readInt()
This will read an integer from the stream.protected java.lang.String
readLine()
This will read a line starting with the byte at offset and going forward until it finds a newline.protected java.lang.String
readLineBackwards()
This will read a line starting with the byte at offset and going backwards until it finds a newline.protected long
readLongBackwards()
This will consume any whitespace, read in bytes until whitespace is found again and then parse the characters which have been read as a long.protected COSName
readNameBackwards()
protected COSNumber
readNumber()
This will read in a number and return the COS version of the number (be it a COSInteger or a COSFloat).protected COSBase
readObject()
This actually reads the object data.COSBase
readObject(long objectNumber, long generation)
This will read an object from the inputFile at whatever our currentOffset is.protected COSBase
readObjectBackwards()
protected java.lang.String
readString()
This will read the next string from the stream.protected java.lang.String
readWord()
void
setRecursivlyRead(boolean recursivlyRead)
-
Methods inherited from class org.apache.pdfbox.pdfparser.BaseParser
clearResources, isClosing, isClosing, isEndOfName, isEOL, isEOL, isWhitespace, isWhitespace, parseBoolean, parseCOSArray, parseCOSDictionary, parseCOSName, parseCOSStream, parseCOSString, parseCOSString, parseDirObject, readExpectedString, readGenerationNumber, readLong, readObjectNumber, readString, readStringNumber, readUntilEndStream, setDocument, skipSpaces
-
-
-
-
Field Detail
-
inputFile
protected RandomAccess inputFile
-
-
Method Detail
-
parse
public void parse() throws java.io.IOException
This will parse the stream and populate the COSDocument object. This will close the stream when it is done parsing.- Throws:
java.io.IOException
- If there is an error reading from the stream or corrupt data is found.
-
getDocument
public COSDocument getDocument() throws java.io.IOException
This will get the document that was parsed. parse() must be called before this is called. When you are done with this document you must call close() on it to release resources.- Returns:
- The document that was parsed.
- Throws:
java.io.IOException
- If there is an error getting the document.
-
getPDDocument
public PDDocument getPDDocument() throws java.io.IOException
This will get the PD document that was parsed. When you are done with this document you must call close() on it to release resources.- Returns:
- The document at the PD layer.
- Throws:
java.io.IOException
- If there is an error getting the document.
-
parseTrailerInformation
protected long parseTrailerInformation() throws java.io.IOException, java.lang.NumberFormatException
- Throws:
java.io.IOException
java.lang.NumberFormatException
-
readByteBackwards
protected byte readByteBackwards() throws java.io.IOException
- Throws:
java.io.IOException
-
readByte
protected byte readByte() throws java.io.IOException
- Throws:
java.io.IOException
-
readBackwardUntilWhitespace
protected java.lang.String readBackwardUntilWhitespace() throws java.io.IOException
- Throws:
java.io.IOException
-
consumeWhitespaceBackwards
protected byte consumeWhitespaceBackwards() throws java.io.IOException
This will read all bytes (backwards) until a non-whitespace character is found. To save you an extra read, the non-whitespace character is returned. If the current character is not whitespace, this method will just return the current char.- Returns:
- the first non-whitespace character found
- Throws:
java.io.IOException
- if there is an error reading from the file
-
consumeWhitespace
protected byte consumeWhitespace() throws java.io.IOException
This will read all bytes until a non-whitespace character is found. To save you an extra read, the non-whitespace character is returned. If the current character is not whitespace, this method will just return the current char.- Returns:
- the first non-whitespace character found
- Throws:
java.io.IOException
- if there is an error reading from the file
-
readLongBackwards
protected long readLongBackwards() throws java.io.IOException, java.lang.NumberFormatException
This will consume any whitespace, read in bytes until whitespace is found again and then parse the characters which have been read as a long. The current offset will then point at the first whitespace character which preceeds the number.- Returns:
- the parsed number
- Throws:
java.io.IOException
- if there is an error reading from the filejava.lang.NumberFormatException
- if the bytes read can not be converted to a number
-
readInt
protected int readInt() throws java.io.IOException
Description copied from class:BaseParser
This will read an integer from the stream.- Overrides:
readInt
in classBaseParser
- Returns:
- The integer that was read from the stream.
- Throws:
java.io.IOException
- If there is an error reading from the stream.
-
readNumber
protected COSNumber readNumber() throws java.io.IOException
This will read in a number and return the COS version of the number (be it a COSInteger or a COSFloat).- Returns:
- the COSNumber which was read/parsed
- Throws:
java.io.IOException
-
parseNumber
protected COSNumber parseNumber(java.lang.String number) throws java.io.IOException
- Throws:
java.io.IOException
-
processCosObject
protected COSBase processCosObject(java.lang.String string) throws java.io.IOException
- Throws:
java.io.IOException
-
readObjectBackwards
protected COSBase readObjectBackwards() throws java.io.IOException
- Throws:
java.io.IOException
-
readNameBackwards
protected COSName readNameBackwards() throws java.io.IOException
- Throws:
java.io.IOException
-
getObject
public COSBase getObject(long objectNumber, long generation) throws java.io.IOException
- Throws:
java.io.IOException
-
readObject
public COSBase readObject(long objectNumber, long generation) throws java.io.IOException
This will read an object from the inputFile at whatever our currentOffset is. If the object and generation are not the expected values and this object is set to throw an exception for non-conforming documents, then an exception will be thrown.- Parameters:
objectNumber
- the object number you expect to readgeneration
- the generation you expect this object to be- Returns:
- the object being read.
- Throws:
java.io.IOException
-
readObject
protected COSBase readObject() throws java.io.IOException
This actually reads the object data.- Returns:
- the object which is read
- Throws:
java.io.IOException
-
readString
protected java.lang.String readString() throws java.io.IOException
This will read the next string from the stream.- Overrides:
readString
in classBaseParser
- Returns:
- The string that was read from the stream.
- Throws:
java.io.IOException
- If there is an error reading from the stream.
-
readDictionaryBackwards
protected COSDictionary readDictionaryBackwards() throws java.io.IOException
- Throws:
java.io.IOException
-
readLineBackwards
protected java.lang.String readLineBackwards() throws java.io.IOException
This will read a line starting with the byte at offset and going backwards until it finds a newline. This should only be used if we are certain that the data will only be text, and not binary data.- Returns:
- the string which was read
- Throws:
java.io.IOException
- if there was an error reading data from the file
-
readLine
protected java.lang.String readLine() throws java.io.IOException
This will read a line starting with the byte at offset and going forward until it finds a newline. This should only be used if we are certain that the data will only be text, and not binary data.- Overrides:
readLine
in classBaseParser
- Returns:
- the string which was read
- Throws:
java.io.IOException
- if there was an error reading data from the file
-
readWord
protected java.lang.String readWord() throws java.io.IOException
- Throws:
java.io.IOException
-
isRecursivlyRead
public boolean isRecursivlyRead()
- Returns:
- the recursivlyRead
-
setRecursivlyRead
public void setRecursivlyRead(boolean recursivlyRead)
- Parameters:
recursivlyRead
- the recursivlyRead to set
-
-