Search Pipeline#
Search Index Configuration#
type: list
items:
type: dict
schema:
glob: {type: str, required: true}
docs_parser: {type: str}
Search Index Data Structure#
search_index:
type: dict
schema:
path: {type: str}
versions:
type: list
items:
- type: dict
schema:
content: {type: str}
chunks:
type: list
items:
- type: dict
schema:
content: {type: str}
embedding:
type: list
items: {type: float}
facts:
type: list
items:
- type: dict
schema:
content: {type: str}
embedding:
type: list
items: {type: float}
is_good: {type: boolean}
Pipeline DAG#
Knowledge Extraction#
Prompt:
Format the following document as a list of self-sufficient evergreen facts. One per line. Include supporting context in each fact.
{text}
Postprocessing:
[line.strip(' \n-') for line in output.splitlines()]
Retrieval-Augmented Generation#
Prompt:
Answer the following question using only the context below. Only include information specifically discussed. Copy the answer verbatim from the context. Exclude irrelevant sentences. Be concise.
Question: {question} Context: {context}