PDF Parser
10 min
pdf utilities the pdf parser connector facilitates the extraction and analysis of content from pdf files, identifying potential security threats embedded within the pdf parser connector is a powerful tool for security teams using swimlane turbine, providing advanced capabilities to extract and analyze content from pdf files it enables the detection of potential threats by identifying malicious entities such as embedded javascript or actions triggered upon opening a pdf additionally, it offers text extraction for further analysis or processing, with options for formatted output and decryption of password protected documents this integration empowers end users to automate the scrutiny of pdfs for enhanced security and streamlined incident response workflows capabilities the pdf utilities connector has the following capabilities analyze a pdf for strings using pdfid parse a pdf to extract text actions analyze pdf extract and analyze content from a pdf to detect potential malicious entities, such as embedded javascript or actions triggered on opening endpoint method get input argument name type required description eval string optional python to eval attachments array required file to be uploaded file string optional parameter for analyze pdf file name string optional name of the resource description string optional parameter for analyze pdf output parameter type description output string pdf information example \[ { "output" "string" } ] parse pdf extracts text from a provided pdf file attachment, enabling further analysis or processing input argument name type required description attachments array required file to be uploaded file string optional parameter for parse pdf file name string optional name of the resource description string optional parameter for parse pdf format output boolean optional if false, the entire pdf data will be returned as plain text if true, the pdf data will be formatted to the extent possible (not perfect!) password string optional password to decrypt the pdf file output parameter type description pdf formatted content array response content page number number output field page number page content string response content pdf text content string response content example \[ { "pdf formatted content" \[ {} ] } ]