ClickHouse Connect
15 min
clickhouse connect facilitates direct querying and retrieval of analytical data from clickhouse databases, streamlining data driven operations clickhouse is an open source, column oriented database management system that enables users to generate analytical data reports in real time the clickhouse connect connector for swimlane turbine allows users to execute sql queries, retrieve datasets, and analyze data seamlessly within their security workflows by integrating with clickhouse connect, swimlane turbine users can enhance their security automation playbooks with powerful data querying and manipulation capabilities, enabling more informed decision making and rapid incident response this connector simplifies complex data operations, making them accessible to security professionals without the need for deep technical database knowledge limitations none to date supported versions this clickhouse connect connector uses the latest version sdk additional docs clickhouse connect authentication link https //clickhouse com/docs/integrations/python#gather your connection detailsclickhouse connect api documentation link https //clickhouse com/docs/integrations/python#streaming queries configuration prerequisites to effectively utilize the clickhouse connect connector, ensure you have the following prerequisites custom authentication with the clickhouse server using the following parameters username the username for authenticating with the clickhouse server password the password associated with the provided username host the network address of the clickhouse server instance authentication methods custom authentication url the address of the clickhouse server username the user account for database access password the password associated with the user account capabilities this clickhouse connect connector provides the following capabilities client query client query the client query method is the primary way to retrieve a single "batch" dataset from the clickhouse server clickhouse connect's documentation for this action can be found here https //clickhouse com/docs/integrations/python#client query method querycontexts clickhouse connect executes standard queries within a querycontext the querycontext contains the key structures that are used to build queries against the clickhouse database and the configuration used to process the result into a queryresult or other response data structure this includes the query itself parameters settings read formats time zone handling a querycontext can be acquired using the client create query context method this method takes the same parameters as the core query method this query context can then be passed to the query , query df , or query np methods as the context keyword argument instead of specifying individual arguments each time any additional arguments specified for the method call will override existing querycontext properties example usage client create query context(query='select value1, value2 from data table where key = {k\ int32}', parameters={'k' 2}, column oriented=true) result = client query(context=qc) assert result result set\[1]\[0] == 'second value2' qc set parameter('k', 1) result = client query(context=qc) assert result result set\[1]\[0] == 'first value2' querycontext supported fields when context=true , the following fields are available field name type description query str the sql query to execute (required) parameters dict query parameters to bind settings dict additional clickhouse settings for query execution query formats dict datatype formatting specification for result values column formats dict datatype formatting per column encoding str encoding used for clickhouse string columns server tz pytz utc server timezone configuration use none bool use python none for clickhouse null values column oriented bool return results as a sequence of columns instead of rows use numpy bool return results as numpy arrays max str len int maximum string length allowed defaults to 0 (unlimited) query tz str timezone name for query execution column tzs dict dictionary mapping column names to specific time zones use extended dtypes bool use extended pandas dtypes for null values as pandas bool return results as a pandas dataframe streaming bool enable streaming queries for large datasets apply server tz bool apply server timezone settings to datetime values external data dict external data configuration for file based queries by utilizing querycontexts, users can enhance query efficiency by dynamically modifying query parameters without reconstructing the full query, ensuring optimized and structured data retrieval configurations clickhouse connect asset configuration parameters parameter description type required url a target host string required port a target port integer optional username username string required password password string required verify ssl verify ssl certificate boolean optional http proxy a proxy to route requests through string optional actions client query retrieve a single 'batch' dataset from the clickhouse server using the provided query endpoint method get input argument name type required description query string required the clickhouse sql select or describe query parameters object optional see parameters description settings object optional see settings description query formats object optional datatype formatting specification for result values column formats object optional datatype formatting per column encoding string optional encoding used to encode clickhouse string columns into python strings python defaults to utf 8 if not set use none boolean optional use python none type for clickhouse nulls if false, use a datatype default (such as 0) for clickhouse nulls note defaults to false for numpy/pandas for performance reasons column oriented boolean optional return the results as a sequence of columns rather than a sequence of rows helpful for transforming python data to other column oriented data formats query tz string optional a timezone name from the zoneinfo database this timezone will be applied to all datetime or pandas timestamp objects returned by the query column tzs object optional a dictionary of column name to timezone name like query tz, but allows specifying different timezones for different columns use extended dtypes boolean optional use pandas extended dtypes (like stringarray), and pandas na and pandas nat for clickhouse null values applies only to query df and query df stream methods external data object optional an externaldata object containing file or binary data to use with the query file path string optional path to a file on the local system path to read the external data from either file path or data is required file name string optional the name of the external data "file" if not provided, will be determined from the file path (without extensions) data string optional the external data in binary form (instead of being read from a file) either data or file path is required fmt string optional the clickhouse input format of the data defaults to tsv types array optional a list of column data types in the external data if a string, types should be separated by commas either types or structure is required structure array optional a list of column name + data type in the data (see examples) either structure or types is required mime type string optional optional mime type of the file data currently clickhouse ignores this http subheader context boolean optional a reusable querycontext object can be used to encapsulate the above method arguments output parameter type description query id string unique identifier json result array result of the operation a string output field a column names array name of the resource summary object output field summary read rows string output field read rows read bytes string output field read bytes written rows string output field written rows written bytes string output field written bytes total rows to read string output field total rows to read result rows string result of the operation result bytes string result of the operation elapsed ns string output field elapsed ns query id string unique identifier first item object output field first item a string output field a first row array output field first row example \[ { "query id" "02d6023d 526e 4fc8 8acd ede4c3ab5d81", "json result" \[ {} ], "column names" \[ "a" ], "summary" { "read rows" "1", "read bytes" "14", "written rows" "0", "written bytes" "0", "total rows to read" "1", "result rows" "1", "result bytes" "269", "elapsed ns" "2071501", "query id" "02d6023d 526e 4fc8 8acd ede4c3ab5d81" }, "first item" { "a" "12345" }, "first row" \[ "12345" ] } ]