色导航

How safe are today鈥檚 MLLMs?
Resources
Case Studies

GumGum Finds A Better Way to Annotate and Classify Text and Images

GumGum selected 色导航 for its robust training data platform and the Machine Learning (ML)-Assisted Data Annotation
April 13, 2020
Share

The Company

GumGum is an artificial intelligence (AI) company with a focus on computer vision (CV) and natural language processing (NLP). For the past 10 years, it has applied its patented capabilities to solving hard problems in a variety of industries, from professional sports to healthcare, but the company built its name with solutions for the digital advertising industry. It was for that industry that GumGum developed one of its most exciting proprietary offerings: webpage content analysis technology.

GumGum鈥檚 technology reviews webpages, identifying and classifying the content it finds in order to help advertisers place digital ads in relevant and brand-safe contexts. Rather than rely on behavioral targeting, which targets ads at users based on their personal online history, GumGum鈥檚 contextual targeting technology serves ads that are aligned with users鈥 interests without infringing on users鈥 data privacy. It also ensures that a brand鈥檚 ads do not appear adjacent to context that is offensive or harmful to brand reputation.

The Challenge

Erica Nishimura, data curator at GumGum, said,

"To provide accurate contextual intelligence for digital ad placements, our technology has to be able to look at images and text on webpages and identify what鈥檚 in them. For an image that means we first need to determine if it鈥檚 safe."

We鈥檒l look for things like hate symbols, violence, nudity, drugs, etc. If we see those things, we prevent ads from being placed. If we determine it鈥檚 safe, we鈥檒l then identify whether it鈥檚 a person鈥檚 face, a specific celebrity鈥檚 face, a dog, or whatever may be relevant to the ad. There鈥檚 a more complex but similar process for analyzing text.鈥

For GumGum鈥檚 algorithms to understand what they are seeing and reading, they must be fed large volumes of relevant annotated . Initially, GumGum worked with two full-time annotators who could, at best, annotate 15,000 rows of text data or 50,000 images per month.

GumGum鈥檚 CV and NLP scientists, who work on the company鈥檚 algorithms, needed a better way to perform text classification, image classification, and in order to efficiently create the high-quality structured data used to train the company鈥檚 advanced machine learning models.

The Solution

GumGum selected 色导航 for its robust training data platform. We offer GumGum data scientists solutions, such as Machine Learning (ML)-Assisted .

The 色导航 platform also allows GumGum team members with no prior coding experience or engineering background to set up a new , especially when the annotation job does become more complicated.

Furthermore, GumGum can now create foreign language data annotation tasks for NLP-related projects. We have annotators who are native or fluent in those languages and can work on the annotation. In the past, 色导航 has successfully completed annotation tasks in Spanish, French, German and Japanese. Nishimura added that 鈥淕umGum is especially happy with the Japanese annotation quality and support, which 色导航 has improved tremendously over the past year.鈥

The Result

鈥淢ost data scientists find a time-consuming process and their wait for data to be labeled unenjoyable鈥 Nishimura said, so they jumped at the chance to use the 色导航 platform and crowd. GumGum is now able to annotate, depending on the task or language, 10,000 rows of data in just a few days鈥攁nd sometimes within just a few hours鈥攁 fraction of the time it previously required for annotating a similarly sized data set. This efficiency freed up their data scientists to work on research for their NLP and CV technology instead of spending the extra time and effort on in-house data annotation.

鈥淲orking with 色导航 has made our model development process 10 times faster, allowing us to get to the next step much quicker and think about audio and video at scale.鈥 Lane Schechter, Product Manager at GumGum said.

鈥淚n addition to the importance of accurate data, a quick turnaround on large data sets is critical to improve and maintain quality of Machine Learning models鈥 Schechter noted, so the accuracy and throughput of 色导航 data was essential in enabling the quality of GumGum鈥檚 Machine Learning models.

鈥淭he 色导航 platform is super neat and easy to navigate compared to most of its competitors鈥 Nishimura said.

鈥淭he 色导航 platform is super neat and easy to navigate compared to most of its competitors. (鈥) Support has been super helpful. I get responses usually within minutes, if not, the following day鈥 鈥 Erica Nishimura, Data Curator, GumGum.

Not only can GumGum create high-quality datasets more efficiently, but it also has the flexibility to customize annotation jobs for specific use cases and leverage our expertise for guidance. GumGum has found a one-stop shop for high-quality ML training data creation, ensuring its employees can focus on growing the business and supporting its customers.

"What鈥檚 been super helpful is to tell my customer success manager what it is I want to achieve, and look to 色导航 to help me with the job design, creation, and coding."

鈥 Erica Nishimura
Data Curator, GumGum