Extracting Questions at Various Levels of Abstraction

There is a hypothesis that creating a database by extracting “questions that can be answered by the text” at various levels of abstraction could be useful.

Prompt

Please create sets of three questions at different levels of abstraction - "specific questions that can be answered by encountering this page", "questions slightly generalized without using technical terms that can be raised", and "further abstracted questions expressed in a general form". When generalizing or abstracting questions, please write questions that are clear and meaningful.
  • This is the kind of prompt that is being used.

Originally, I was planning to do this in a book format, but extracting only individual pages without context often leads to confusion. Therefore, I would like to try this with the content from /nishio.

Work Log

  • Extracted articles of 500-1500 characters from /nishio

    • Total of 5000 pages
  • What to do with the model

    • When comparing 4o and 4o-mini, 4o indeed has a higher ability to abstract questions
      • 4o-mini tends to provide slightly off-target abstractions
    • Processing all 5000 pages, it would cost a few thousand yen for 4o and a few hundred yen for 4o-mini
  • Should I just do 500 pages with 4o? (/villagepump/blu3mo)

    • output_questions.json
    • image
      • Seems quite good (blu3mo)(blu3mo)(blu3mo)
      • Happiness
        • With this data, in RAG, it is possible to extract specific knowledge from abstract questions
          • Traditional RAGs simply measured the distance between questions and documents, so there was often a lack of appropriate specific information for abstract questions
          • Since the abstraction process is done by LLM in advance, it seems possible to extract specific knowledge from abstract questions
          • Interesting (/villagepump/cak)
        • Humans can generate and understand abstract questions even without understanding the specific content or context of the text.- If one is not aware of KJ Method, they would not come up with the question, “What specific tools are available for conducting KJ Method using digital tools?” and would not be able to understand it.
  • However, the question of “What methods are effective for organizing information?” arises and can be understood.

  • When considering human usage, it seems important not only to “create” but also to “understand.”

  • Additional Note: The distinction between concrete and abstract can also be described as high context dependency versus low context dependency.

    • For instance, concrete and abstract questions in a book by Ryūju:
      • “concrete_question”: “What does ‘三世実有法体恒有’ mean in the Abhidharma, and how is it proven?”

        • It’s unclear. Who are the Abhidharma? (/villagepump/blu3mo)
      • “general_question”: “What is the essence of things that exist across past, present, and future, and how can their existence be confirmed?”

        • With undefined terms like “existence,” “essence,” and “past,” the discussion becomes ambiguous/abstract, but still understandable. (/villagepump/blu3mo)
      • “abstract_question”: “What is existence? How should it be defined beyond time?”

        • Though it becomes more ambiguous, one can roughly understand what the discussion is about as an open-ended question. (/villagepump/blu3mo)
    • In simple terms, is it okay to understand that being too concrete is like using too much jargon? (/villagepump/はるひ)
      • Context dependency could also be considered a narrow form of specialized terminology.
  • There seem to be various ways to support thinking that moves back and forth between abstract and concrete concepts.

  • I want to do this with physical books as well (blu3mo)(blu3mo).

    • I want to think about ways to supplement context appropriately.
    • I have a feeling that something amazing would happen if I processed all ten thousand interesting books and built a database of “abstract questions.” (/villagepump/blu3mo)
      • It might be possible to find that “Page 234 of Book A” and “Page 51 of Book B” offer different perspectives on the same abstract question.
      • Or, abstracting “Problem X” could lead to the usefulness of a description on “Page 49 of Book C.”
      • Generally, activities like reading comprehension involve moving back and forth between abstraction and concreteness. (/villagepump/はるひ)
        • +1 (/villagepump/blu3mo)
        • Is that true? (/villagepump/はるひ)
          • It feels more like a kind of interpretation.
        • What’s the difference between having a list of questions and just asking ChatGPT to explain things clearly?
          • I’m still organizing my thoughts, so any feedback is welcome (blu3mo)(blu3mo)(blu3mo)
            • Let me try to write a tentative response:
            • In the example of Ryūju, without knowing about Ryūju, one cannot question what “三世実有法体恒有” means in the Abhidharma.
            • However, one can question abstractly, “What is the essence of things that exist across past, present, and future?”
              • (It could also be good to arrive at this by abstracting a different concrete question with the help of AI).
            • With a “question list,” one can use abstract questions as queries to search for specific knowledge or visualize the world of abstract questions. (/villagepump/blu3mo)
      • Going off on a tangent, it would be nice to have a fractal summarizing interface for moving between abstract and concrete here (/villagepump/blu3mo).
    • (After seeing an example comparing similarities with Cosense) Even just reading one book, without the need for complex activities like using Cosense to actively read between distant paragraphs, seems useful (/villagepump/はるひ).
  • While it is interesting, ultimately, without designing the interface between humans and a “database of abstract questions” well, the desired outcome cannot be achieved.

    • The crucial aspect has not yet been resolved.
    • For now, I would like to try using something like Talk to the City in a rough manner (/villagepump/blu3mo).
  • The goal is to have a mechanism where one can utilize knowledge from a large number of books without having to read them all.### September 15, 2024

  • Image Image

  • Conducted a TTTC (Two Truths and a Lie) exercise with 2500 “abstract questions” extracted from the 500 pages of /nishio.

    • Despite discussing completely different topics in a concrete manner, pages with similar abstract themes are placed close together.
    • For demonstration purposes, randomly selected two points visually close to each other and attached pages.
      • Example 1:
        • What factors accelerate the loss of diversity among participants due to situations lacking fairness?
        • How can mechanisms be created in society to feel fairness?
        • This is interesting (blu3mo)(blu3mo)(blu3mo)
          • An analogy can be drawn between “gender in Werewolf Match” and “young and elderly in Silver Democracy.”
          • Beyond that, I want to explore the concept of “drawing analogies” (/villagepump/blu3mo)
            • Image
            • That’s an interesting point (/villagepump/blu3mo)
      • Example 2:
      • Example 3:
    • Furthermore, by referring to the steps of abstraction by LLM, the connection between pages and abstract questions can be understood.
      • For instance, if the connection between the page /nishio/Werewolf Match and the question “What factors accelerate the loss of diversity among participants due to situations lacking fairness?” is not clear, following the flow below can help understand the connection:
        • Abstract Level 3: “What factors accelerate the loss of diversity among participants due to situations lacking fairness?”
        • -> Abstract Level 2: “What are the causes of specific groups being unfairly disadvantaged in the game environment or social system?”
        • -> Abstract Level 1: “Is there a possibility of unfairly excluding female players in this app’s system?”
        • -> /nishio/Werewolf Match
  • It might be beneficial to establish a more structured framework for these “abstract questions” here.

    • It would be helpful to articulate what makes a good question.

September 15, 2024

  • The content from just 500 pages of /nishio is already fascinating, and my excitement level is quite high.
    • Understanding 500 pages is manageable, but introducing 5000 pages might introduce too much noise, making it hard to comprehend.
    • However, at the moment, I am not manually eliminating noise, so it feels scalable.
  • I want to add other books here and see what happens.
    • First, I will include a book by Ryūju.
    • output_questions_Ryūju.json
      • Created (/villagepump/blu3mo)- Instead of summarizing, the focus is on reducing context-dependency through abstraction.
    • There is a sense of being in a different paradigm from the traditional approach of “helping to understand the content of a book by summarizing” (blu3mo).
    • I see! (villagepump/haruhi)
      • So, it’s like wanting to do it with ten thousand books.
  • By performing heavy computations for abstraction in advance and creating a dictionary, it seems like there are many possibilities that can be derived from there (villagepump/blu3mo).

The concept is likened to the complete automatic generation of Helpfeel (villagepump/bsahd).

  • Yes (villagepump/blu3mo)