Natural Language Processing

(From the perspective of an expert in information science)

  • Two types of “meaning”: word-level meaning / structural meaning

  • Word-level meaning

    • Includes things like the proximity in word2vec
    • Includes relationships such as synonyms, hypernyms/hyponyms (hyponymy)
    • Synonyms, similarities, hypernyms/hyponyms, etc. exist as data in thesaurus and ontology
      • For example, WordNet
  • Structural meaning

  • Creating thesaurus and ontology manually is extremely difficult and requires regular updates

    • It would be great if we could create them from vast amounts of text data (e.g. Twitter) (=corpus)
    • Using context clues (blu3mo)
    • Based on the idea that similar words appear in the vicinity of words with similar meanings
    • word2vec uses Machine Learning to make the vectors of the main word and the surrounding words closer
      • (Closer = larger dot product)