PDF Parser
11 min
pdf utilities the pdf parser connector facilitates the extraction and analysis of content from pdf files, identifying potential security threats embedded within the pdf parser connector is a powerful tool for security teams using swimlane turbine, providing advanced capabilities to extract and analyze content from pdf files it enables the detection of potential threats by identifying malicious entities such as embedded javascript or actions triggered upon opening a pdf additionally, it offers text extraction for further analysis or processing, with options for formatted output and decryption of password protected documents this integration empowers end users to automate the scrutiny of pdfs for enhanced security and streamlined incident response workflows capabilities the pdf utilities connector has the following capabilities analyze a pdf for strings using pdfid parse a pdf to extract text actions analyze pdf extract and analyze content from a pdf to detect potential malicious entities, such as embedded javascript or actions triggered on opening endpoint method get input argument name type required description eval string optional python to eval attachments array required file to be uploaded attachments file string optional parameter for analyze pdf attachments file name string optional name of the resource attachments description string optional parameter for analyze pdf input example {"eval" "string","attachments" \[{"file" "string","file name" "example name","description" "string"}]} output parameter type description output string pdf information output example {"output" "string"} parse pdf extracts text from a provided pdf file attachment, enabling further analysis or processing input argument name type required description attachments array required file to be uploaded attachments file string optional parameter for parse pdf attachments file name string optional name of the resource attachments description string optional parameter for parse pdf format output boolean optional if false, the entire pdf data will be returned as plain text if true, the pdf data will be formatted to the extent possible (not perfect!) password string optional password to decrypt the pdf file input example {"attachments" \[{"file" "string","file name" "example name","description" "string"}],"format output"\ true,"password" "string"} output parameter type description pdf formatted content array response content pdf formatted content page number number response content pdf formatted content page content string response content pdf text content string response content output example {"pdf formatted content" \[{"page number" 1,"page content" "some content"}]} response headers header description example content type the media type of the resource application/json date the date and time at which the message was originated thu, 01 jan 2024 00 00 00 gmt