• When processing large amounts of data on a computer,

  • need to be minimized.

  • When trying to reduce computation time, it is common to create indexes.

    • This increases memory usage.
    • To improve this trade-off, the data structure needs to be designed carefully.
  • Methods to efficiently perform string searches (compression of indexes)

    • Suffix tree (a tree structure)
      • Stores a list of suffixes in a tree structure where each node represents one character.
      • It is efficient but uses a lot of memory.
    • Suffix array
      • A simple array that contains all suffixes sorted in lexicographical order.
      • It is lighter than a suffix tree.
    • Compressed suffix array
      • Features
        • Can retrieve the i-th element of the suffix array and its reverse array in time.
        • Can retrieve a substring of a string in time.
      • With these two features, the position of a suffix that is one character shorter than a given suffix can be determined.
        • Together with the original string T, this enables quick searches.
  • succinct representation

    • A representation that almost matches the lower bound of information theory in terms of size.

    • If there are L possible inputs, the lower bound is approximately bits.

    • Succinct index

      • A bit vector/array (e.g., {1,0,0,1,1,0,1}) with
        • rank operation: returns the number of 1s in a given range
          • Succinct index: Divides the vector into blocks to minimize its size and precomputes the answers to queries for each block?
            • (blu3mo) I’m not sure if I understand this correctly.
        • select operation: returns the position of the i-th 1.
    • Succinct representation of trees

      • For ordered trees (trees with a defined order among siblings)
        • LOUDS representation
          • Encodes the degrees of nodes in a breadth-first order using unary coding.
          • Unary coding: represents a number using a sequence of consecutive 1s.
            • Why? —> 0 is used as a separator.
          • This becomes a bit vector, so various tree operations can be performed using the rank and select operations of succinct index.
        • BP representation
          • A representation like (()((())())())(()())()).
          • Represents the hierarchical structure using parentheses, where ( represents 0 and ) represents 1.
          • To perform operations like findclose, a data structure called interval min-max tree is used.
            • It divides the structure into blocks and stores the maximum and minimum values for each block.
            • This allows the construction of a tree structure within each block.
            • This enables fwd_search.

(Information Science Expert)