Skip to content

tevfikcagridural/rag_on_sec_documents

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Small Scale Production-Grade RAG System

The system can answer complex queries such as combining information from multiple pages or even multiple documents. Data, Q&A pairs are originated from Docugami Knowledge Graph Retrieval Augmented Generation Datasets. Details about how question and answer sets are developed can be found in the SEC FORM 10-Q's README

Documents Information

The SEC Form 10-Q is a quarterly report required by the Securities and Exchange Commission (SEC) that provides unaudited financial statements and other information about a company's operations and financial condition. While there is a basic formatting standard, different companies' forms can vary in style and explanation. Most importantly, these forms contain complex tables that store valuable information; accurately extracting this information is crucial.

This app contains forms for the following companies; Apple, Intel, Microsoft, Nvidia, from 2022 Q3 to 2024 Q1.

Tech Stack

Other Details

  • Chunking Strategy: MarkdownElementNodeParser^1
  • Total pages of docs: 1364
  • Report Period: 2022Q3 - 2024Q1
  • Reporting Compines: Apple, Amazon, Intel, Microsoft, Nvidia
  • Vector Length: 1024
  • Similarity Metric: Cosine
  • Prompt Management: LangchainHub
  • Query Rephrasing: Dynamic metadata retrieval
  • Node Postprocessors:

Ingestion Cost

  • Embedding: ~$0.16
  • LLM: ~$1.83

Sample Questions

Question Type Question Source Docs
Multi-Doc How has Apple's total net sales changed over time? 2022-Q3-AAPL, 2023-Q1-AAPL, 2023-Q2-AAPL, 2023-Q3-AAPL 2024-Q1-APPL
Single-Doc How does Microsoft's revenue distribution across its various business segments in the latest 10-Q compare to the cost of sales for those segments? 2023-Q3-MSFT

Further Improvements

  • Fast API integration
  • Concurrency/Parallelization
  • Caching
  • Tool usage (e.g Pulling trading symbol)
  • Image extraction & multi-modality
  • Guardrailing

About

Small Scale RAG System On SEC Documents for Tech Companies

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages