Skip to main content

Show HN: Programmatic – a REPL for creating labeled data https://ift.tt/dzHNJq9

Show HN: Programmatic – a REPL for creating labeled data Hey HN, I’m Jordan cofounder of Humanloop (YC S20) and I’m excited to show you Programmatic — an annotation tool for building large labeled datasets for NLP without manual annotation . Programmatic is like a REPL for data annotation. You: 1. Write simple rules/functions that can approximately label the data 2. Get near-instant feedback across your entire corpus 3. Iterate and improve your rules Finally, it uses a Bayesian label model [1] to convert these noisy annotations into a single, large, clean dataset, which you can then use for training machine learning models. You can programmatically label millions of datapoints in the time taken to hand-label hundreds. What we do differently from weak supervision packages like Snorkel/skweak[1] is to focus on UI to give near-instantaneous feedback. We love these packages but when we tried to iterate on labeling functions we had to write a ton of boilerplate code and wrestle with pandas to understand what was going on. Building a dataset programmatically requires you to grok the impact of labeling rules on a whole corpus of text. We’ve been told that the exploration tools and feedback makes the process feel game-like and even fun (!!). We built it because we see that getting labeled data remains a blocker for businesses using NLP today. We have a platform for active learning (see our Launch HN [2]) but we wanted to give software engineers and data scientists a way to build the datasets needed themselves and to make best use of subject-matter-experts’ time. The package is free and you can install it now as a pip package [2]. It supports NER / span extraction tasks at the moment and document classification will be added soon. To help improve it, we'd love to hear your feedback or any success/failures you’ve had with weak supervision in the past. [1]: We use a HMM model for NER tasks, and Naive-Bayes for classification using the two approaches given in the papers below: Pierre Lison, Jeremy Barnes, and Aliaksandr Hubin. "skweak: Weak Supervision Made Easy for NLP." https://ift.tt/rCsUQqy (2021) Alex Ratner, Christopher De Sa, Sen Wu, Daniel Selsam, Chris Ré. "Data Programming: Creating Large Training Sets, Quickly" https://ift.tt/NpztrfE (NIPS 2016) [2]: Our Launch HN for our main active learning platform, Humanloop – https://ift.tt/puJhGLo [3]: Can install it directly here https://ift.tt/OqgB267... https://ift.tt/T1xHpaS April 8, 2022 at 05:35PM

Comments

Popular posts from this blog

Show HN: Tape It, iOS recording app for musicians https://ift.tt/3udBTSi

Show HN: Tape It, iOS recording app for musicians Hello HN, Over the last 15 months, two friends and I developed the music recording app we felt we wanted based on our own needs as musicians. It's called Tape It [1] and has just recently hit the Apple App Store [2]. We put a lot of effort into a good UX to help musicians really focus on playing their instrument instead of pretending to be a recording engineer. The app records in stereo on newer iPhones (although that's a premium feature; the free version only records in standard mono audio quality). I would be really grateful for advice from this community on how to best approach marketing. We had a great TechCrunch article covering our launch [3], and we posted it on various music websites. Turns out advertising on Google or Apple Search is a dark art, though. We have some good ideas for developing a good social media presence, but they will take time. Please hit us with feedback, opinions and advice that you think a young ind...

Show HN: Moderator,lightweight peer4peer anon forum https://ift.tt/3fZSDGl

Show HN: Moderator,lightweight peer4peer anon forum hello all! here's a link to my little pinteresting like forum that stores no data on the server and uses IPFS for image storage. The design aesthetic is that everything would in 64kb of memory so we're going for a collapse-proof low bandwidth experience. this makes moderator really fast. https://moderator.rocks is the web preview, a flutter client is in the works at https://ift.tt/32wqdRb take a look, post something fun, ask questions. I'm also on twitter @moderatorium in case interested. Have fun! January 26, 2022 at 12:23AM

Show HN: Comment on live websites just like you comment on Google Docs/Figma https://ift.tt/GRhrjX0

Show HN: Comment on live websites just like you comment on Google Docs/Figma I'd love your feedback on this new JS plugin we launched. With this, you can comment on live websites just like you comment on Google Docs or Figma. You can use is to get Copy or UI feedback right on the website you are building. Feedback can be provided in rich formats like audio and video. You can get started by installing a JS tag in the footer of the website. You can then turn the review mode on or off on demand by adding “?review=true” to the URL. Demo video (43s): https://www.youtube.com/watch?v=cdnfBEw8TfI Demo video: https://www.youtube.com/watch?v=h6vxzXJuh8o https://ift.tt/ocLpdEu October 26, 2022 at 02:18AM