PhantomJs Cloud
16 min
phantomjs cloud provides web page rendering and scraping services via an api, enabling automated website information retrieval and screenshot capture phantomjs cloud is a powerful web automation and data extraction platform that allows users to capture website information and screenshots programmatically the phantomjs cloud connector for swimlane turbine enables users to automate the retrieval of detailed website information in json format, including rendering and structure, as well as capturing high quality screenshots of web pages this integration empowers security teams to streamline the collection of web intelligence and visual evidence, enhancing their cyber investigation and monitoring capabilities without the need for complex coding prerequisites to effectively utilize the phantomjs cloud connector within turbine, ensure you have the following prerequisites api key authentication url the endpoint for the phantomjs cloud api service api key a unique identifier to authenticate requests to phantomjs cloud notes lower your cost use requestsettings {"donewhen" \[{ "event" "domready"},]} as shown in this example that page loads in about 6 seconds, whereas the original without dom ready loads in about 20 seconds the difference is that we render when the browser signals domready normally we wait until all resources (ads, css,images, etc) are loaded track cost view the http response headers we send back with every api call this includes details such as the cost of your page and how many credits remain see all the pjsc billing headers what resources failed to load some pages have resources (images, css, fonts, etc) that fail to load usually due to broken links you can track this by inspecting the http response headers see all the pjsc content resource headers all metadata about the page / processing use the outputasjson \ true property this will return your content and all details about load times, cookies, iframes, sub resources, etc actions setup example requests minimal request { "url" "http //www google com", "rendertype" "jpg", "outputasjson"\ false, "requestsettings { "donewhen" \[ { "event" "domready"} ] } } request with all inputs { "url" "http //google com", "content" null, "urlsettings" { "operation" "get", "encoding" "utf8", "headers" {}, "data" null }, "rendertype" "jpg", "outputasjson" true, "requestsettings" { "donewhen" \[ { "event" "domready"} ], "ignoreimages" false, "disablejavascript" false, "useragent" "mozilla/5 0 (windows nt 6 2; wow64) applewebkit/534 34 (khtml, like gecko) safari/534 34 phantomjs/2 0 0 (phantomjscloud com/2 0 1)", "authentication" { "username" "guest", "password" "guest" }, "xssauditingenabled" false, "websecurityenabled" false, "resourcewait" 15000, "resourcetimeout" 35000, "maxwait" 35000, "waitinterval" 1000, "stoponerror" false, "resourcemodifier" \[], "customheaders" {}, "clearcache" false, "clearcookies" false, "cookies" \[], "deletecookies" \[] }, "suppressjson" \[ "events value resourcerequest headers", "events value resourceresponse headers", "framedata content", "framedata childframes" ], "rendersettings" { "quality" 70, "pdfoptions" { "border" null, "footer" { "firstpage" null, "height" "1cm", "lastpage" null, "onepage" null, "repeating" "%pagenum%/%numpages%" }, "format" "letter", "header" null, "height" null, "orientation" "portrait", "width" null }, "cliprectangle" null, "renderiframe" null, "viewport" { "height" 1280, "width" 1280 }, "zoomfactor" 1, "passthroughheaders" false }, "scripts" { "domready" \[], "loadfinished" \[] } } configurations api key authentication authenticates using an api key configuration parameters parameter description type required url a url to the target host string required apikey api key string required verify ssl verify ssl certificate boolean optional http proxy a proxy to route requests through string optional actions get website information retrieve detailed information of a website in json format, including rendering and structure, using the specified url and rendertype endpoint method post input argument name type required description url string required url endpoint for the request content object optional response content urlsettings object optional url endpoint for the request operation string optional parameter for get website information encoding string optional parameter for get website information headers object optional http headers for the request data object optional response data rendertype string required type of the resource outputasjson boolean required parameter for get website information requestsettings object optional parameter for get website information donewhen array optional parameter for get website information event string optional parameter for get website information ignoreimages boolean optional parameter for get website information disablejavascript boolean optional parameter for get website information useragent string optional parameter for get website information authentication object optional parameter for get website information username string optional name of the resource password string optional parameter for get website information xssauditingenabled boolean optional parameter for get website information websecurityenabled boolean optional parameter for get website information resourcewait number optional parameter for get website information resourcetimeout number optional parameter for get website information maxwait number optional parameter for get website information waitinterval number optional parameter for get website information stoponerror boolean optional error message if any output parameter type description status code number http status code of the response reason string response reason phrase statuscode number status value statusmessage object status value originalrequest object output field originalrequest websecurityenabled boolean output field websecurityenabled pages array output field pages url string url endpoint for the request content object response content urlsettings object url endpoint for the request operation string output field operation encoding string output field encoding headers object http headers for the request data object response data rendertype string type of the resource outputasjson boolean output field outputasjson requestsettings object output field requestsettings donewhen array output field donewhen event string output field event ignoreimages boolean output field ignoreimages disablejavascript boolean output field disablejavascript useragent string output field useragent authentication object output field authentication username string name of the resource password string output field password example \[ { "status code" 200, "response headers" { "pjsc billing credit cost" "0 000151384", "pjsc billing elapsedms" "2192", "pjsc billing bytes" "257,914", "pjsc billing proxy ingress bytes" "0", "pjsc billing proxy ingress cost" "0", "pjsc billing total credits remaining" "0 048475749", "pjsc billing daily subscription credits remaining" "0 047475749", "pjsc billing prepaid credits remaining" "0 001", "local address" "190 195 70 130", "pjsc backend id" "1 05np", "pjsc content status code" "200", "pjsc content name" "www google com jpeg", "pjsc content url" "http //www google com/", "pjsc content page exec last waited on" "waitinterval(1000) not yet met still need to wait 40", "pjsc content done detail" "{\\"reason\\" \\"match donewhen {\\\\\\"event\\\\\\" \\\\\\"domready\\\\\\"}\\",\\"statuscode\\" 200}" }, "reason" "ok", "json body" { "statuscode" 200, "statusmessage" null, "originalrequest" {}, "pageresponses" \[], "meta" {}, "content" {}, "queryjson" \[] } } ] get website screenshot capture a screenshot of a specified website using phantomjs cloud, requiring the url and render type endpoint method post input argument name type required description url string required url endpoint for the request content object optional response content urlsettings object optional url endpoint for the request operation string optional parameter for get website screenshot encoding string optional parameter for get website screenshot headers object optional http headers for the request data object optional response data rendertype string required type of the resource outputasjson boolean optional parameter for get website screenshot requestsettings object optional parameter for get website screenshot donewhen array optional parameter for get website screenshot event string optional parameter for get website screenshot ignoreimages boolean optional parameter for get website screenshot disablejavascript boolean optional parameter for get website screenshot useragent string optional parameter for get website screenshot authentication object optional parameter for get website screenshot username string optional name of the resource password string optional parameter for get website screenshot xssauditingenabled boolean optional parameter for get website screenshot websecurityenabled boolean optional parameter for get website screenshot resourcewait number optional parameter for get website screenshot resourcetimeout number optional parameter for get website screenshot maxwait number optional parameter for get website screenshot waitinterval number optional parameter for get website screenshot stoponerror boolean optional error message if any output parameter type description status code number http status code of the response reason string response reason phrase file object file file name string name of the resource file string output field file example \[ { "status code" 200, "response headers" { "pjsc billing credit cost" "0 000131835", "pjsc billing elapsedms" "1754", "pjsc billing bytes" "252,335", "pjsc billing proxy ingress bytes" "0", "pjsc billing proxy ingress cost" "0", "pjsc billing total credits remaining" "0 048495298", "pjsc billing daily subscription credits remaining" "0 047495298", "pjsc billing prepaid credits remaining" "0 001", "local address" "190 195 70 130", "pjsc backend id" "1 727r", "pjsc content status code" "200", "pjsc content name" "www google com jpeg", "pjsc content url" "http //www google com/", "pjsc content page exec last waited on" "waitinterval(1000) not yet met still need to wait 43", "pjsc content done detail" "{\\"reason\\" \\"match donewhen {\\\\\\"event\\\\\\" \\\\\\"domready\\\\\\"}\\",\\"statuscode\\" 200}" }, "reason" "ok", "response text" "\ufffd\ufffd\ufffd\ufffd\u0000\u0010jfif\u0000\u0001\u0001\u0000\u0000\u0001\u0000\u0001\u0000\u0000\ufffd\ufffd\u0002(icc profile\u0000\u0001\u0001\u0000\u0000\u0002\u0018\u0000\u0000\u0000\u0000\u00040\u0000\u0000mntrrgb xyz \u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000acsp\u0000\u0000 " } ] response headers header description example access control expose headers http response header access control expose headers www authenticate,server authorization alt svc http response header alt svc h3=" 443 "; ma=2592000,h3 29=" 443 "; ma=2592000 cache control directives for caching mechanisms no cache content disposition http response header content disposition filename="www google com jpeg" content encoding http response header content encoding gzip content type the media type of the resource image/jpeg date the date and time at which the message was originated wed, 28 dec 2022 17 05 16 gmt local address http response header local address 190 195 70 130 pjsc backend id http response header pjsc backend id 1 05np pjsc billing bytes http response header pjsc billing bytes 252,335 pjsc billing credit cost http response header pjsc billing credit cost 0 000151384 pjsc billing daily subscription credits remaining http response header pjsc billing daily subscription credits remaining 0 047495298 pjsc billing elapsedms http response header pjsc billing elapsedms 1754 pjsc billing prepaid credits remaining http response header pjsc billing prepaid credits remaining 0 001 pjsc billing proxy ingress bytes http response header pjsc billing proxy ingress bytes 0 pjsc billing proxy ingress cost http response header pjsc billing proxy ingress cost 0 pjsc billing total credits remaining http response header pjsc billing total credits remaining 0 048475749 pjsc content done detail http response header pjsc content done detail {"reason" "match donewhen {"event" "domready"}","statuscode" 200 } pjsc content event phase http response header pjsc content event phase "load" pjsc content name http response header pjsc content name www google com jpeg pjsc content page exec last waited on http response header pjsc content page exec last waited on waitinterval(1000) not yet met still need to wait 43 pjsc content resource aborted http response header pjsc content resource aborted 0 pjsc content resource active http response header pjsc content resource active 1 pjsc content resource complete http response header pjsc content resource complete 12 pjsc content resource failed http response header pjsc content resource failed 0