Show HN: Programmatic – a REPL for creating labeled data Hey HN, I’m Jordan cofounder of Humanloop (YC S20) and I’m excited to show you Programmatic — an annotation tool for building large labeled datasets for NLP without manual annotation . Programmatic is like a REPL for data annotation. You: 1. Write simple rules/functions that can approximately label the data 2. Get near-instant feedback across your entire corpus 3. Iterate and improve your rules Finally, it uses a Bayesian label model [1] to convert these noisy annotations into a single, large, clean dataset, which you can then use for training machine learning models. You can programmatically label millions of datapoints in the time taken to hand-label hundreds. What we do differently from weak supervision packages like Snorkel/skweak[1] is to focus on UI to give near-instantaneous feedback. We love these packages but when we tried to iterate on labeling functions we had to write a ton of boilerplate code and wrestle with pandas to understand what was going on. Building a dataset programmatically requires you to grok the impact of labeling rules on a whole corpus of text. We’ve been told that the exploration tools and feedback makes the process feel game-like and even fun (!!). We built it because we see that getting labeled data remains a blocker for businesses using NLP today. We have a platform for active learning (see our Launch HN [2]) but we wanted to give software engineers and data scientists a way to build the datasets needed themselves and to make best use of subject-matter-experts’ time. The package is free and you can install it now as a pip package [2]. It supports NER / span extraction tasks at the moment and document classification will be added soon. To help improve it, we'd love to hear your feedback or any success/failures you’ve had with weak supervision in the past. [1]: We use a HMM model for NER tasks, and Naive-Bayes for classification using the two approaches given in the papers below: Pierre Lison, Jeremy Barnes, and Aliaksandr Hubin. "skweak: Weak Supervision Made Easy for NLP." https://ift.tt/rCsUQqy (2021) Alex Ratner, Christopher De Sa, Sen Wu, Daniel Selsam, Chris Ré. "Data Programming: Creating Large Training Sets, Quickly" https://ift.tt/NpztrfE (NIPS 2016) [2]: Our Launch HN for our main active learning platform, Humanloop – https://ift.tt/puJhGLo [3]: Can install it directly here https://ift.tt/OqgB267... https://ift.tt/T1xHpaS April 8, 2022 at 05:35PM
Women Pioneers at Muni: Adeline Svendsen and Muni’s First Newsletter By Jeremy Menzies To close out Women’s History Month, here’s a look back at one woman whose work to bring Muni staff together in the late 1940s created a legacy that lives on to this day. Adeline “Addy” Svendsen was founding editor of Muni’s first internal newsletter, “ Trolley Topics .” Adeline Svendsen sits at her desk in the Geneva Carhouse office building in this 1949 shot. Trolley Topics was a new venture when it started in February 1946. As Svendsen wrote in the first issue it was created, “to bring a little fun, a little news, and a lot of good will to all our fellow employees in the Railway.” Just two years prior in 1944, Muni merged with the Market Street Railway Company, expanding the small municipal operation into the largest transit provider in the city with hundreds of employees, vehicles of every shape and size, and dozens of facilities scattered across town. The newsletter was meant to help unite ...
Comments