Integration
OCSF
Data Lake
Supercharging Snowflake Ingestion with Apache Arrow, ADBC, and OCSF
Snowflake has become a go-to data platform for security and data teams looking to manage and analyze massive volumes of events. At Tenzir, we’ve recently unveiled a new integration that allows you to seamlessly stream data from Tenzir pipelines into Snowflake—while staying columnar under the hood. This post explains how we leverage Apache Arrow and Arrow Database Connectivity (ADBC) to efficiently deliver structured data (often in OCSF format) to Snowflake at high throughput.
ADBC: Arrow's Answer to ODBC/JDBC
In the database world, ODBC and JDBC have long been the standard interfaces to connect with databases. They often transmit data row by row, which can be inefficient for large-scale or streaming workloads.
Arrow Database Connectivity (ADBC) flips this model by using the columnar format of Apache Arrow for data exchange. Think of it as the Arrow-based equivalent of ODBC/JDBC. Because Arrow is columnar, ADBC can:
Eliminate Row-by-Row Overhead: Bulk ingest data in chunks with far fewer conversions.
Deliver High-Throughput Streaming: Move sizable data batches efficiently.
Offer Interoperability: Once in Arrow, data can be processed by a variety of tools without repeated conversions.
ADBC provides drivers so that interacting with existing databases in a columnar way works out of the box.
Tenzir Pipelines: Built for Streaming
Tenzir is a data pipeline engine designed for continuous, high-throughput streaming. Its primary focus is on turning unstructured or partially structured data into structured, analytics-ready formats. Internally, Tenzir uses Apache Arrow record batches to shovel blocks of events between pipeline operators.
A common scenario involves normalizing (and enriching) security logs into the OCSF (Open Cybersecurity Schema Framework) schema. Because OCSF is designed for structured analytics, it pairs well with Tenzir’s columnar data path. Perfect for building out massive-scale detection and response capabilities in Snowflake.
Arrow-to-Arrow Streaming to Snowflake
Our integration uses ADBC’s Snowflake driver to send Arrow batches directly from Tenzir pipelines into Snowflake. Under the hood, Tenzir treats the process as a streaming columnar data flow. Even though Snowflake ultimately relies on a “stage and load” mechanism, Tenzir’s perspective is that data flows in real time—no row-by-row overhead, no manual CSV or JSON exports to transitional storage. Here's how it works in practice:
Events Flow into Tenzir: You ingest logs, parse them, and optionally transform them into OCSF or any other structured schema Tenzir supports.
Events Remain in Arrow Format: Tenzir’s internal representation of structured data is columnar from the start.
Arrow Batches go to ADBC: The Tenzir
to_snowflake
output operator passes Arrow record batches to the ADBC driver that handles the rest.
Snowflake’s side, however, relies on a two-step staging process. The ADBC driver stages the data first, then loads it into Snowflake tables.
Why This Matters
Tenzir primarily focuses on continuous streaming rather than large one-off bulk transfers, but the Arrow-based approach delivers tangible throughput benefits:
Minimized Serialization Overhead: Data remains in Arrow from Tenzir to Snowflake, removing extra conversion layers.
High Streaming Throughput: Columnar data batches can be quickly ingested. Even if Snowflake’s architecture stages data, the pipeline from Tenzir still pushes events in real time.
Future-Proofing: As Arrow and ADBC evolve, this integration will keep pace, leveraging improvements in Arrow’s columnar ecosystem and additional drivers.
Conclusion
By combining Tenzir's columnar streaming engine with ADBC and Apache Arrow, security and data professionals can efficiently pipe structured, analytics-ready data (like OCSF) into Snowflake. This integration removes the friction traditionally associated with row-based transfer methods, enabling high-throughput, real-time ingestion.
If you're curious about how an end-to-end columnar pipeline can improve your analytics workflows, give Tenzir’s Snowflake integration a try and let us know your thoughts. We're excited to continue refining and benchmarking this setup to help you push the boundaries of scalable security data pipelines!
Check out our Snowflake integration page on our documentation to learn more about how it works. And if your fingers are itching you can try it out right away with our free Community Edition.