Working with Turbine Schema

turbine schema was formerly known as turbine extensible data schema, or teds overview turbine schema establishes a standardized data schema aimed at creating a unified framework to support seamless collaboration between security teams and tools it promotes consistent data formats and schemas within swimlane turbine products, enhancing the detection, analysis, and response capabilities for security incidents this standardization also simplifies the integration process for various cybersecurity tools, reducing complexity and minimizing integration effort this guide covers what turbine schema is and why it matters how it relates to interfaces where and how it is used across turbine solutions best practices, anti patterns, and troubleshooting schema references field level definitions for each solution area live in these documents turbine schema reference (classic soc) docid\ abbcn4v4trb5c1gqzvd2 alert, email, observable, enrichment, and supporting objects for the soc solutions bundle turbine schema reference (ai soc) docid\ xeftuqkosig4kr6tnwjtj extended alert and email objects for the ai soc solution turbine schema reference (vrm) docid\ zo6haleemnspljyamh2ad vulnerability finding, asset, and remediation/ticket objects for vrm interface catalogs interfaces define the input/output contracts that use turbine schema objects for interface documentation, see working with interfaces docid\ ldkbcwttb0snenayr9yri how turbine schema relates to interfaces turbine schema and interfaces serve complementary roles turbine schema defines the data models the structure and fields of objects like alert, email, observable, and vulnerability finding interfaces define the input/output contracts they specify which turbine schema objects a component accepts and produces when you apply an interface to a component, the interface configures the component's input and output schemas using turbine schema objects this means components that use the same interface automatically share the same data format, making them interchangeable where turbine schema is used turbine schema schemas are integrated throughout swimlane turbine in several key areas soc solutions bundle the soc solutions bundle uses turbine schema schemas extensively for security operations workflows alert triage solution processes alerts from siem, xdr, and edr systems using the alert object schema alerts are ingested via webhooks or api requests and transformed into standardized alert objects following turbine schema conventions phishing triage solution processes suspected phishing emails using the email and phishing email report object schemas email data is extracted and structured according to turbine schema standards for consistent processing and analysis threat intelligence solution uses observable and enrichment object schemas to standardize threat intelligence data from various providers, ensuring consistent enrichment results across different sources ai soc solution the ai soc solution extends the classic soc schemas with additional fields for enhanced alert context (priority, host criticality, mitre d3fend mappings, supporting evidence) email authentication checks (spf, dmarc, dkim) search based ingestion parameters for alerts and emails vulnerability response management the vrm solution uses its own set of turbine schema objects for vulnerability finding captures scan results from tools like tenable, qualys, and rapid7 with cvss/epss scoring, exploit intelligence, and remediation tracking enriched vulnerability finding extends findings with asset criticality and zone context asset represents managed assets in the inventory remediation item / ticket tracks itsm ticket creation and status application field naming when building applications in swimlane turbine, you can follow turbine schema naming conventions for your field keys to ensure compatibility with turbine schema based workflows and integrations this is especially important when creating applications that will receive data from turbine schema compliant sources building custom solutions that integrate with any solutions bundle ensuring data consistency across multiple systems and integrations playbook actions turbine schema schemas can be applied to playbook actions through input schema references when configuring record actions (create, update/create, search), you can reference turbine schema based schemas to ensure your playbooks accept and process data in standardized formats business use case for customers who prefer to develop their own solutions, data management can present a significant challenge standard data fields and naming conventions become essential for maintaining consistency and avoiding data loss or errors for instance, in scenarios where customers use a database like mongodb atlas event manager, even though swimlane does not provide a direct solution for this specific database, the data still needs to be accurately saved to the database or application a schema with consistent naming conventions is crucial; mismatches between field names can lead to lost or mishandled records for customers unfamiliar with turbine schema references, it is important to ensure that their applications adhere to correct naming conventions to avoid any data errors and ensure reliable data management across systems how to use turbine schema in your workflows using turbine schema in applications when building applications that will work with turbine schema based data follow turbine schema naming conventions use the field keys defined in the relevant schema reference document when creating application fields for example, if creating an alert application, use alert uid as the field key for the unique identifier, not alertid or alert id match field types ensure your application field types match the turbine schema types for example use string fields for alert title , alert description use date & time fields for alert created timestamp , alert start timestamp use multi select or array fields for alert categories , alert impacted hostnames use reference fields for nested objects like observables or alert rules required vs optional fields mark fields as required based on the turbine schema requirements fields marked as "required" should be required in your application, while "recommended" and "optional" fields can be optional using turbine schema in playbooks when building playbooks that process turbine schema formatted data apply interfaces use interfaces that implement turbine schema schemas these interfaces automatically configure your component's input and output schemas to match turbine schema standards reference schemas when configuring record actions, you can reference turbine schema based input schemas to ensure your playbooks accept data in the correct format data transformation use transformation functions to map incoming data to turbine schema field names if your source data uses different naming conventions example workflow here is an example of how turbine schema is used in an alert triage workflow alert ingestion a webhook receives an alert from a siem system schema application the alert data is validated against the alert turbine schema observable extraction observables (ips, urls, file hashes) are extracted and structured using the observable schema enrichment each observable is enriched using threat intelligence providers, with results following the enrichment schema case creation a case is created in the case and incident management application using turbine schema field names analysis the case is analyzed using hero ai or manual review, with all data following turbine schema conventions this standardized approach ensures that data flows seamlessly between different components and systems, regardless of the underlying technology or vendor guidelines for attribute names attribute names must be valid utf 8 sequences use lowercase for all attribute names separate words with underscores apply present tense unless the attribute refers to historical information use singular or plural forms appropriately to match the field content example use events per sec instead of event per sec if an attribute represents multiple entities, use a pluralized name and set the value type as an array example process loaded modules stores a list of module names avoid word repetition example instead of host host ip , use host ip minimize abbreviations, with exceptions for commonly recognized terms (for example, ip , os , geo ) attribute levels the schema categorizes attributes into three levels core attributes common across all use cases, designated as either required or recommended optional attributes relevant to specific use cases or allow flexibility based on the context reserved attributes managed by the logging system and should not be used in event data extending the schema the open cybersecurity schema framework allows for extensions through additional attributes, objects, and event classes to extend the schema create a new directory mirroring the top level schema directory structure this directory can include the following files and subdirectories categories json defines a new event category and reserves a range of class ids dictionary json defines new attributes events/ contains definitions for new event classes objects/ holds definitions for new objects best practices field naming consistency always use lowercase field keys must be lowercase (for example, alert uid , not alert uid or alertuid ) use underscores separate words with underscores (for example, alert created timestamp , not alertcreatedtimestamp or alert created timestamp ) follow turbine schema conventions use the exact field keys defined in the reference documents to ensure compatibility avoid abbreviations unless they are commonly recognized (for example, ip , os , geo ), spell out full words schema compliance validate against schemas when building custom solutions, validate your data structures against turbine schema before deployment handle optional fields design your workflows to handle missing optional fields gracefully required fields always include required fields; missing required fields will cause validation errors type matching ensure data types match the schema definitions (for example, arrays for multi value fields, datetime for timestamps) integration tips start with solutions bundles if you are new to turbine schema, start by using a solutions bundle, which already implements turbine schema correctly use interfaces apply interfaces to your components to automatically get turbine schema compliant schemas test transformations when mapping data from external sources to turbine schema format, test your transformations thoroughly document deviations if you need to extend turbine schema, document your extensions clearly common anti patterns understanding what not to do is just as important as following best practices the following examples show common mistakes and how to correct them incorrect field naming problem using camelcase or mixed case instead of snake case wrong { "alertuid" "alert 12345", "alertcreatedtimestamp" "2025 01 15t10 30 00z", "alertimpactedhostnames" \["host1", "host2"] } correct { "alert uid" "alert 12345", "alert created timestamp" "2025 01 15t10 30 00z", "alert impacted hostnames" \["host1", "host2"] } why it fails field names are case sensitive components expecting alert uid will not find alertuid , causing data to be lost or validation errors wrong field types problem using strings for array fields or incorrect data types wrong { "alert categories" "phishing, malware", "alert impacted hostnames" "workstation 01", "alert risk score" "85" } correct { "alert categories" \["phishing", "malware"], "alert impacted hostnames" \["workstation 01"], "alert risk score" 85 } why it fails array fields must be arrays, not comma separated strings integer fields must be numbers, not strings type mismatches cause validation errors and prevent proper data processing missing required fields problem omitting required fields from turbine schema objects wrong { "alert title" "suspicious activity", "alert severity" "high" } correct { "alert uid" "alert 12345", "alert title" "suspicious activity", "alert severity" "high", "raw alert" {} } why it fails required fields ( alert uid and raw alert for alert objects) must always be present missing required fields cause schema validation failures and prevent data ingestion incorrect nested object structure problem not following the correct structure for nested objects like observables or enrichments wrong { "observables" { "observable type" "ipv4 public", "observable value" "203 0 113 1" } } correct { "observables" \[ { "observable type" "ipv4 public", "observable value" "203 0 113 1" } ] } why it fails observables must be an array, even for a single observable using an object instead of an array causes type validation errors incorrect observable type values problem using invalid values for observable type field wrong { "observable type" "ip", "observable value" "203 0 113 1" } correct { "observable type" "ipv4 public", "observable value" "203 0 113 1" } why it fails observable type must use exact turbine schema values ipv4 public , ipv4 private , ipv6 public , ipv6 private , url , domain , email , sha256 , sha1 , md5 , or file invalid types cause validation errors and prevent observable processing date format inconsistencies problem using incorrect date/time formats wrong { "alert created timestamp" "2025 01 15 10 30 00", "alert start timestamp" "jan 15, 2025 10 30 am" } correct { "alert created timestamp" "2025 01 15t10 30 00z", "alert start timestamp" "2025 01 15t10 28 15z" } why it fails all datetime fields must use iso 8601 format with utc timezone ( z suffix) incorrect formats cause parsing errors and time based queries to fail mixing naming conventions problem inconsistent naming within the same object wrong { "alert uid" "alert 12345", "alerttitle" "suspicious activity", "alert created timestamp" "2025 01 15t10 30 00z" } correct { "alert uid" "alert 12345", "alert title" "suspicious activity", "alert created timestamp" "2025 01 15t10 30 00z" } why it fails all fields must consistently use snake case mixing conventions causes some fields to be unrecognized and data to be lost troubleshooting common issues data not appearing in applications problem data ingested via turbine schema is not appearing in your application records solutions verify field keys match turbine schema conventions exactly (case sensitive, underscore separated) check that field types match the schema (for example, array fields for multi value data) ensure required fields are present in your data validate your data structure against the turbine schema before ingestion schema validation errors problem playbook actions fail with schema validation errors solutions review the error message to identify which field is causing the issue compare your data structure to the turbine schema definition check for typos in field names (for example, alert uid vs alert ui ) ensure nested objects follow the correct structure (for example, observable objects within arrays) integration compatibility issues problem components or connectors are not working together as expected solutions verify all components use the same interface version check that input/output schemas match between connected components review the interface documentation to ensure you are using compatible interfaces test components individually before integrating them into larger workflows observable enrichment failures problem observable enrichments are not being applied or are missing from results solutions verify observable type uses valid turbine schema values (for example, ipv4 public , not ip or ip ) ensure observable value is properly formatted (for example, valid ip address, url, or hash) check that enrichment provider components are correctly configured verify enrichment results follow the enrichment schema structure { "enrichment type" "reputation", "enrichment provider" "virustotal", "enrichment verdict" "malicious", "enrichment timestamp" "2025 01 15t10 00 00z" } review playbook execution logs to identify which enrichment step failed date/time format issues problem timestamp fields are not being recognized or parsed correctly solutions ensure all datetime fields use iso 8601 format yyyy mm ddthh\ mm\ ssz verify timezone is specified (use z for utc or +hh\ mm for other timezones) check that datetime fields are strings, not date objects in json for relative times (for example, "4 hours ago"), ensure your transformation logic converts them to absolute timestamps before storing array vs single value confusion problem data appears as a single value when it should be an array, or vice versa solutions always use arrays for multi value fields, even when there is only one value never use comma separated strings for array fields check field definitions in your application to ensure array fields are configured as multi select or array types use transformation functions to convert comma separated strings to arrays if needed nested object structure problems problem nested objects like observables, enrichments, or detection rules are not structured correctly solutions verify nested arrays contain objects, not primitives ensure required fields are present in nested objects (for example, observable type and observable value for observable objects) check that nested object structures match the turbine schema exactly (field names, types, nesting levels) use the solutions bundle interfaces as reference implementations field name typos and case sensitivity problem fields are not being recognized due to typos or case mismatches solutions double check field names against the relevant turbine schema reference document (field names are case sensitive) common typos to avoid alert uid vs alert ui (missing "d") observable type vs observabletype (wrong case) email from address vs email from (missing " address") use ide autocomplete or schema validation tools to catch typos early compare your field names character by character with the schema definitions missing raw data fields problem the raw alert or raw email field is missing or incorrectly formatted solutions always include raw alert (for alert objects) or raw email (for email objects) as required fields store the complete original payload in the raw field ensure the raw field is a json object, not a string (unless your system requires string serialization) preserve all original data to enable forensic analysis and debugging references json schema specification https //json schema org/mitre att\&ck framework https //attack mitre org/