The system can answer complex queries such as combining information from multiple pages or even multiple documents. Data, Q&A pairs are originated from Docugami Knowledge Graph Retrieval Augmented Generation Datasets. Details about how question and answer sets are developed can be found in the SEC FORM 10-Q's README
The SEC Form 10-Q is a quarterly report required by the Securities and Exchange Commission (SEC) that provides unaudited financial statements and other information about a company's operations and financial condition. While there is a basic formatting standard, different companies' forms can vary in style and explanation. Most importantly, these forms contain complex tables that store valuable information; accurately extracting this information is crucial.
This app contains forms for the following companies; Apple, Intel, Microsoft, Nvidia, from 2022 Q3 to 2024 Q1.
- Data Framework: LlamaIndex
- PDF Parsing: LlamaParse
- Vector DB: Pinecone
- LLM: OpenAI GPT-4o-mini
- Embedding Model: OpenAI Text Embedding 3 Large
- Reranking: Cohere
- Observation: Langfuse
- Containerization: Docker
- Cloud Platform: Google Run
- GUI Framework: Streamlit
- Chunking Strategy: MarkdownElementNodeParser^1
- Total pages of docs: 1364
- Report Period: 2022Q3 - 2024Q1
- Reporting Compines: Apple, Amazon, Intel, Microsoft, Nvidia
- Vector Length: 1024
- Similarity Metric: Cosine
- Prompt Management: LangchainHub
- Query Rephrasing: Dynamic metadata retrieval
- Node Postprocessors:
- Embedding: ~$0.16
- LLM: ~$1.83
Question Type | Question | Source Docs |
---|---|---|
Multi-Doc | How has Apple's total net sales changed over time? | 2022-Q3-AAPL, 2023-Q1-AAPL, 2023-Q2-AAPL, 2023-Q3-AAPL 2024-Q1-APPL |
Single-Doc | How does Microsoft's revenue distribution across its various business segments in the latest 10-Q compare to the cost of sales for those segments? | 2023-Q3-MSFT |
- Fast API integration
- Concurrency/Parallelization
- Caching
- Tool usage (e.g Pulling trading symbol)
- Image extraction & multi-modality
- Guardrailing