Skip to main content

Launch HN: Cord (YC W21) – training data toolbox for computer vision https://ift.tt/3jDuBlx

Launch HN: Cord (YC W21) – training data toolbox for computer vision Hey HN community - I’m Ulrik from Cord ( https://cord.tech ) in the current YC W21 batch [1] - we are building software that allows people to label their data intelligently using a toolbox of various ‘labeling algorithms’. Labeling algorithms are any units of intelligence (e.g. a pre-trained model, or an interpolation algorithm) that help automate the annotation process. This enables data science and machine learning teams to rapidly iterate on their ML models without having to farm out labeling tasks to an external workforce. Today we’re launching the first part of our product, our Web App, which serves our initial set of automation features through a GUI. It also allows you to classify images and draw vector labels, visualize data, and perform collaborative QA. Computer vision ML algorithms are widely used for tasks like detecting everyday objects such as cars and pedestrians. However, they are yet to see widespread adoption for things like detecting cancerous polyps during an endoscopic procedure or blood clots in MRI scans. The lack of massive-scale labeled training datasets that fuel contemporary approaches is often the blocking element in building ML applications that solve these more specialised tasks. We also believe that the core part of the IP of an ML application stems from the labeled data used to train it. Creating these datasets is challenging for several reasons. Labeling the data requires expensive domain-expert annotators, and privacy might prevent the data from being sent to an external workforce. Ultimately most labeling work tends to be done using open-source tools that were not created for speed and purpose-built to handle massive-scale datasets[2]. These tools also tend to provide a poor experience for the end consumer of the training data (e.g., data scientists, ML engineers) because they lack intelligence and require high manual input. The initial seed of the idea came while I was working on a CS master’s project of visualizing massive-scale medical image datasets. I saw saw how much time and effort was being spent by doctors on labeling data. I met my co-founder Eric, who had worked as a quant researcher in finance, and after meeting him we realized we could take an algorithmic approach to tackling the labeling problem. Instead of writing trading algorithms, we turned our focus to writing labeling algorithms. For example, for a food calorie estimation project we translated image level classifications of food items to individualized bounding box labels using a labeling algorithm we wrote with our SDK, requiring only one manual label per food item. Although it was an image dataset, our algorithm approximated noisy bounding box labels by using a CSRT object tracker across images. It then trained a shallow Faster RCNN ‘micro-model’ on the noisy labels, ran inference on the data, and suppressed earlier noisy labels. We then quickly visually reviewed and adjusted the results on our Web App[3]. We have applied a similar approach in areas such as gastroenterology[4] and pathology. The days of relying on an army of human annotators and waiting to start the model building process are hopefully (soon) over. We are incredibly excited to be driving for that change - and are delighted to be sharing Cord with the HN community! We would love to hear your feedback. How are you going about creating and managing training data today? What are your key constraints? If you have used a creative method to label your data before, please share. Thank you so much in advance! [1] What I Learned From My First Month at Y Combinator - https://ift.tt/374heFH... [2] Why You Should Ditch Your In-House Training Data Tools (And Avoid Building Your Own) - https://ift.tt/3rI5Mrl [3] Label a Dataset with a Few Lines of Code - https://ift.tt/372UCFE... [4] Pain Relief for Doctors Labelling Data - https://ift.tt/3jE73wE... February 11, 2021 at 11:06PM

Comments

Popular posts from this blog

Show HN: Launch VM workloads securely and instantaneously, without VMs https://ift.tt/2QwJ1Kd

Show HN: Launch VM workloads securely and instantaneously, without VMs Hello HN! We've been working on a new hypervisor https://kwarantine.xyz that can run strongly isolated containers. This is still a WIP, but we wanted to give the community an idea about our approach, its benefits, and various use cases it unlocks. Today, VMs are used to host containers, and make up for the lack of strong security as well as kernel isolation in containers. This work adds this missing security piece in containers. We plan on launching a free private beta soon. Meanwhile, we'd deeply appreciate any feedback, and happy to answer any questions here or on our slack channel. Thanks! April 29, 2021 at 07:50AM

Show HN: Comment on live websites just like you comment on Google Docs/Figma https://ift.tt/GRhrjX0

Show HN: Comment on live websites just like you comment on Google Docs/Figma I'd love your feedback on this new JS plugin we launched. With this, you can comment on live websites just like you comment on Google Docs or Figma. You can use is to get Copy or UI feedback right on the website you are building. Feedback can be provided in rich formats like audio and video. You can get started by installing a JS tag in the footer of the website. You can then turn the review mode on or off on demand by adding “?review=true” to the URL. Demo video (43s): https://www.youtube.com/watch?v=cdnfBEw8TfI Demo video: https://www.youtube.com/watch?v=h6vxzXJuh8o https://ift.tt/ocLpdEu October 26, 2022 at 02:18AM

Women Pioneers at Muni: Adeline Svendsen and Muni’s First Newsletter

Women Pioneers at Muni: Adeline Svendsen and Muni’s First Newsletter By Jeremy Menzies To close out Women’s History Month, here’s a look back at one woman whose work to bring Muni staff together in the late 1940s created a legacy that lives on to this day. Adeline “Addy” Svendsen was founding editor of Muni’s first internal newsletter, “ Trolley Topics .” Adeline Svendsen sits at her desk in the Geneva Carhouse office building in this 1949 shot. Trolley Topics was a new venture when it started in February 1946. As Svendsen wrote in the first issue it was created, “to bring a little fun, a little news, and a lot of good will to all our fellow employees in the Railway.” Just two years prior in 1944, Muni merged with the Market Street Railway Company, expanding the small municipal operation into the largest transit provider in the city with hundreds of employees, vehicles of every shape and size, and dozens of facilities scattered across town. The newsletter was meant to help unite ...