Data Scientist

Apply now

Apply for Job

Date: Sep 5, 2023

Location: New York, NY, US, 10281

Company: Associated Press

The Associated Press is an independent global news organization dedicated to factual reporting. Founded in 1846, AP today remains the most trusted source of fast, accurate, unbiased news in all formats and the essential provider of the technology and services vital to the news business. More than half the world's population sees AP journalism every day.


The Associated Press’ Metadata and Data Science Team seeks a Data Scientist based in New York, NY. 


The Data Scientist will perform data analysis, evaluate commercial and open-source models, and help design data science and data engineering solutions supporting AI initiatives, news search and discovery, content enrichment and metadata generation.


The team works closely with various departments and functions across the organization to design, implement and manage end-to-end metadata, to maintain the integrity of schema standards, and to build solutions with data, analytics, and machine learning methods.


If you were working with us, here are some tasks you may have undertaken in the past month:

  • Partner with the search team to define, fine-tune and test NLP and machine learning capabilities to improve the customer search experience

  • Set up and execute annotation tasks to build quality data training sets; use your knowledge of current best practices and available datasets to research and find best matching data to supplement training tasks

  • Fine-tune existing statistical and machine learning models used in production; measure performance improvements and present findings to stakeholders

  • Combine, clean and pre-process Google Analytics and other data from various sources to ensure accuracy and usability for analysis or modeling

  • Analyze news metadata to find and propose fixes to support content search and discovery

  • Help stakeholders understand and evaluate opportunities to use GenAI in their platforms

  • Develop a prototype for matching and identifying component assets within video clips, using tools like AWS Step Functions and SageMaker


We are seeking a candidate who:

  • approaches problems with curiosity and an eagerness to find creative solutions and make recommendations with an eye to costs and benefits

  • is happy digging deep into messy data problems, which may include data wrangling and coordinating across the organization, to propose a range of solutions and implement the best candidate

  • proactively explores emerging technologies and is conversant with the latest analytics, data science, NLP and machine learning techniques

  • is comfortable working with data models and data science platforms and processes with a focus on quality and performance

  • is eager to learn about the relevant technical details of a large-scale media operation, has patience for imperfect systems and workflows, and can self-organize in order to effectively take on several projects simultaneously

  • can understand who their audience is and can clearly communicate analysis and present POC demos at audience level

  • has an analytical mindset and strong problem-solving skills with the ability to think critically about data-related challenges.

  • helps grow the culture of data-driven decision making and supports colleagues as they develop data literacy skills



  • At least 3-5 years of relevant experience

  • Hands-on experience using data analysis and data science tools and methods, such as NLP techniques, LLMs, ETL, AWS SageMaker, AWS Rekognition, Google Vertex AI, Vision, AutoML, PowerBI, Pandas, NumPy

  • Familiarity with analytics and data science best practices; adept at predictive modeling

  • Familiarity with AWS services, including Step Functions, Athena, Lightsail, S3, EC2, Lambdas

  • Confidence implementing and demonstrating PoCs

  • Adept at Python and able to collaborate with colleagues working in R

  • Comfort working with standard data formats and schema (xml, json, etc); understanding of data transformation methods

  • Familiarity with standard query languages and transform protocols such as xslt, SPARQL, xquery


Advanced-level professional competency in written and spoken English language is required. Authorization to work in the United States for any employer is mandatory.

The Metadata Team is based in New York City on a hybrid basis and strongly prefers local candidates, but we encourage all qualified candidates to apply.


The anticipated salary range for this position is $105,000 - $120,000 contingent on experience and other job-related factors.  Employees are eligible to participate, according to the terms of the official plan documents, in a 401(k) plan, employer-sponsored health insurance plan, and are eligible for paid time off and holidays in accordance with AP policy.


Application deadline is 11:59pm EST on September 26, 2023


AP seeks to build an inclusive organization grounded in respect for differences. We support all aspects of diversity and provide equal employment opportunity to all employees and applicants without regard to race, color, religion, sex, marital status, national origin, age, sexual orientation, gender identity, disability or status as a veteran. We encourage members of traditionally underrepresented communities to apply.

Nearest Major Market: New York City

Job Segment: Database, Data Analyst, Open Source, XML, Technology, Data, Research