Copyright Law

Discussion on the Service of Feeding Physical Books to AI

from /villagepump/2024/09/02

  • Customers using Scanbee need to obtain permission from the copyright holder for all books requested, except when the person ordering the service owns the copyright.

  • The service seems to operate on the premise that users have confirmed with the copyright holders (blu3mo)
    • Bookscan also follows this. They won’t scan books explicitly marked as not allowed (basis)
      • Users can determine this using a barcode scanner
      • They sometimes send back books that shouldn’t be scanned (nishio)
  • Is using book data for information analysis and machine learning an exception? (blu3mo)
    • This is concerning
    • If the original text is processed by LLM without human reading, is permission from the copyright holder unnecessary?
    • Is Article 30, Paragraph 4 the basis for this?
      • If it’s for information analysis purposes, copying copyrighted material to a storage medium is allowed (same for machine learning)
      • However, it should be for purposes of enjoyment
      • In this case, it seems there are services that won’t provide the scanned PDF itself, but are they really doing that?

(Use not intended for enjoying the expression of ideas or emotions in copyrighted works) Article 30-4: In cases where copyrighted works are not intended for self-enjoyment or enjoyment by others of the ideas or emotions expressed in the works, to the extent deemed necessary, the works can be used in any way, regardless of the method. However, if it unreasonably harms the interests of the copyright holder in terms of the type and purpose of the work and the manner of use, this provision does not apply.

  1. When used for testing purposes for the development or practical application of technology related to the recording, filming, or other use of copyrighted works
  2. When used for information analysis (extracting, comparing, classifying, or analyzing information related to language, sound, images, or other elements from a large number of copyrighted works or other large amounts of information)
  3. In addition to the above two cases, when using the copyrighted work in the process of information processing by an electronic computer without involving human perception of the expression of the work (excluding the execution of the work on an electronic computer in the case of program works).
  • When the purpose is not to enjoy the ideas or emotions expressed in the copyrighted work
    • For example, “paraphrasing and publicly sharing the entire content of a book” is likely not allowed since the purpose is to enable enjoyment (blu3mo)
      • It could also harm the interests of the copyright holder (blu3mo)
  • Without involving human perception
    • It’s interesting how it changes depending on whether a person perceives it (blu3mo)