Using the GPT Index to explore the content of Scrapbox.

@robbalian: Hey GPT: When did I peak? I created a model that searches through thousands of pages of my emails and personal notes. You can use it too at https://t.co/tSHSzWoM6q Here’s what I learned… 🧵

  • Looks like I can use this collaboration.
import json
 
# open the json file
with open('scrapbox_export.json') as json_file:
    data = json.load(json_file)
 
# iterate through the pages
for page in data['pages']:
    title = page['title']
    lines = page['lines']
    # join the lines using newline character
    content = "\n".join(lines)
    title = title.replace("/", "-")
    # print the title and content of each page
    f = open("data/" + title + ".txt", "w+")
    f.write(content)
  • For now, this can replace the json with a txt file.
    • There seems to be an error with /blu3mo, but it’s unresolved.
  • With that, you can do Semantic Search and Q&A.
    • However, it seems that the stage of pulling useful files with Semantic Search is not going well.
      • Could it be because of the Japanese language?
      • I need to understand the mechanisms of Embedding and Indexing (blu3mo).
        • I think it’s impossible with Japanese! (tkgshn)
        • I think it will work if we translate and put it in.