• Embedding the transcription of lecture notes and lecture videos using Flair and BERT.

  • Comparing the embedded vectors using an NN Classifier.

    • Initially tried using Cosine Distance.
    • Cosine Distance is not good when the dimension is high.
  • Data:

    • Summarized notes of TED Talks found online.
  • Linking process:

  • NN:

    • Eventually, time data will be incorporated.
  • Potential future challenges:

    • Is it sufficient to link only to the beginning of the note?
    • Are there any issues with using very long or very short sentences as they are?
      • Very long sentences are usually continuous speech.
    • Is it enough to just apply BERT?
    • Are there any biases in the selection process (e.g. video length)?
  • Thoughts:

    • Writing the desired parts of the video in natural language can generate subtitles.
    • Can we learn that “XXX” indicates a quote? (Might be difficult with BERT)
    • Can we learn what should and should not be taken in notes? #NaturalLanguageProcessing #Minerva