Harvard and Google to release 1 million publicdomain books as AI training dataset | TechCrunch

Image for article Harvard and Google to release 1 million publicdomain books as AI training dataset | TechCrunch
News Source : TechCrunch

News Summary

  • Harvard University launches Institutional Data Initiative (IDI) The IDI includes financial backing from Microsoft and OpenAI
  • The dataset includes 1 million public-domain books, spanning genres, languages, and authors including Dickens, Dante, and Shakespeare
  • The new dataset isn’t available yet, and it’s not clear when or how it will be released
AI training data has a big price tag, one bestsuited for deeppocketed tech firms. This is why Harvard University plans to release a dataset that includes in the region of 1 million publicdomain bo [+970 chars]

Must read Articles