Show HN: Capillaries: Distributed data processing with Go and Cassandra I started thinking about this approach after working on a large-scale project for a major financial company where our group developed a distributed in-house data processing solution. On a regular basis, it ingested a few gigabytes of financial data and, within a tight SLA time limit, produced a lot of enriched/aggregated/validated data for a number of customers. Sometimes, source data had errors, so operators with domain knowledge had to verify data validity at some checkpoints, immediately make corrections, and re-run parts of the workflow manually. The solution involved complex web service orchestration, custom database and was very demanding on the infrastructure availability. Capillaries is a built from scratch, open-source Go solution that does just that: ingests data and applies user-defined transforms - Go one-liner expressions, Python formulas, joins, aggregations, denormalization - using Cassandra for intermediate data storage and RabbitMQ for task scheduling. End users just have to provide: - source data in CSV files; - Capillaries script (JSON file) that defines the workflow and the transforms; - Python code that performs complex calculations (only if needed). The whole data processing pipeline can be split into separate runs that can be started independently and re-run by the user if needed. The goal is to build a platform that is tolerant to database and processing node failures, and allows users to focus on data transform logic and data quality control. “Getting started” Docker-based demo calculates ARK funds performance, using EOD holdings and transactions data acquired from public sources. There are also integration tests that use non-financial data. There is a test deploy tool that uses Openstack API for provisioning in the cloud. https://capillaries.io May 16, 2023 at 06:13AM
Women Pioneers at Muni: Adeline Svendsen and Muni’s First Newsletter By Jeremy Menzies To close out Women’s History Month, here’s a look back at one woman whose work to bring Muni staff together in the late 1940s created a legacy that lives on to this day. Adeline “Addy” Svendsen was founding editor of Muni’s first internal newsletter, “ Trolley Topics .” Adeline Svendsen sits at her desk in the Geneva Carhouse office building in this 1949 shot. Trolley Topics was a new venture when it started in February 1946. As Svendsen wrote in the first issue it was created, “to bring a little fun, a little news, and a lot of good will to all our fellow employees in the Railway.” Just two years prior in 1944, Muni merged with the Market Street Railway Company, expanding the small municipal operation into the largest transit provider in the city with hundreds of employees, vehicles of every shape and size, and dozens of facilities scattered across town. The newsletter was meant to help unite ...
Comments