Skip to content

Commit 6951b7b

Browse files
authored
Merge pull request #879 from ScrapeGraphAI/pre/beta
New Features, Examples Refactoring and Bug Fix
2 parents 02022cc + 04bd3f8 commit 6951b7b

File tree

538 files changed

+1737
-24112
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

538 files changed

+1737
-24112
lines changed

CHANGELOG.md

+30
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,33 @@
1+
## [1.36.0-beta.1](https://github.com./ScrapeGraphAI/Scrapegraph-ai/compare/v1.35.1-beta.1...v1.36.0-beta.1) (2025-01-12)
2+
3+
4+
### Features
5+
6+
* add example of collab ([1fad118](https://github.com./ScrapeGraphAI/Scrapegraph-ai/commit/1fad1181a6b2d654c4eb996348907940b1d8a7af))
7+
8+
9+
### Bug Fixes
10+
11+
* updated ollama structured output ([3b95911](https://github.com./ScrapeGraphAI/Scrapegraph-ai/commit/3b9591156d96ac7266055703e7ffb354e90b01f0))
12+
13+
14+
### Docs
15+
16+
* improved readme + fix csv scraper imports ([14b4b19](https://github.com./ScrapeGraphAI/Scrapegraph-ai/commit/14b4b19f60e33c855bee4eea0a1a6fcc01a98c1a))
17+
* refactoring of the doc ([5ca325c](https://github.com./ScrapeGraphAI/Scrapegraph-ai/commit/5ca325c7257b71fc4cd12ee26bde3e992ade5756))
18+
19+
## [1.35.1-beta.1](https://github.com./ScrapeGraphAI/Scrapegraph-ai/compare/v1.35.0...v1.35.1-beta.1) (2025-01-12)
20+
21+
22+
### Bug Fixes
23+
24+
* ollama tokenizer limited to 1024 tokens + ollama structured output + fix browser backend ([ad693b2](https://github.com./ScrapeGraphAI/Scrapegraph-ai/commit/ad693b2bb201b4d9280139e70a2930358e779366))
25+
26+
27+
### Docs
28+
29+
* ✨ code quality badge update ([02022cc](https://github.com./ScrapeGraphAI/Scrapegraph-ai/commit/02022cc5db39fede1a1d920d17e18ba0d05328ba))
30+
131
## [1.35.0](https://github.com./ScrapeGraphAI/Scrapegraph-ai/compare/v1.34.2...v1.35.0) (2025-01-06)
232

333

cookbook/README.md

-9
This file was deleted.

docs/source/getting_started/installation.rst

+6-6
Original file line numberDiff line numberDiff line change
@@ -25,18 +25,18 @@ The library is available on PyPI, so it can be installed using the following com
2525

2626
It is higly recommended to install the library in a virtual environment (conda, venv, etc.)
2727

28-
If your clone the repository, it is recommended to use a package manager like `rye <https://rye.astral.sh/>`_.
29-
To install the library using rye, you can run the following command:
28+
If your clone the repository, it is recommended to use a package manager like `uv <https://github.com./astral-sh/uv>`_.
29+
To install the library using uv, you can run the following command:
3030

3131
.. code-block:: bash
3232
33-
rye pin 3.10
34-
rye sync
35-
rye build
33+
uv pin 3.10
34+
uv sync
35+
uv build
3636
3737
.. caution::
3838

39-
**Rye** must be installed first by following the instructions on the `official website <https://rye.astral.sh/>`_.
39+
**Rye** must be installed first by following the instructions on the `official website <https://github.com./astral-sh/uv>`_.
4040

4141
Additionally on Windows when using WSL
4242
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

docs/source/introduction/overview.rst

+70-9
Original file line numberDiff line numberDiff line change
@@ -30,37 +30,93 @@ ScrapGraphAI supports a wide range of AI models from various providers. Each mod
3030
OpenAI Models
3131
-------------
3232
- GPT-3.5 Turbo (16,385 tokens)
33-
- GPT-4 (8,192 tokens)
33+
- GPT-3.5 (4,096 tokens)
34+
- GPT-3.5 Turbo Instruct (4,096 tokens)
3435
- GPT-4 Turbo Preview (128,000 tokens)
35-
- GPT-4o (128000 tokens)
36-
- GTP-4o-mini (128000 tokens)
36+
- GPT-4 Vision Preview (128,000 tokens)
37+
- GPT-4 (8,192 tokens)
38+
- GPT-4 32k (32,768 tokens)
39+
- GPT-4o (128,000 tokens)
40+
- O1 Preview (128,000 tokens)
41+
- O1 Mini (128,000 tokens)
3742

3843
Azure OpenAI Models
3944
-------------------
4045
- GPT-3.5 Turbo (16,385 tokens)
41-
- GPT-4 (8,192 tokens)
46+
- GPT-3.5 (4,096 tokens)
4247
- GPT-4 Turbo Preview (128,000 tokens)
43-
- GPT-4o (128000 tokens)
44-
- GTP-4o-mini (128000 tokens)
48+
- GPT-4 (8,192 tokens)
49+
- GPT-4 32k (32,768 tokens)
50+
- GPT-4o (128,000 tokens)
51+
- O1 Preview (128,000 tokens)
52+
- O1 Mini (128,000 tokens)
4553

4654
Google AI Models
4755
----------------
4856
- Gemini Pro (128,000 tokens)
57+
- Gemini 1.5 Flash (128,000 tokens)
4958
- Gemini 1.5 Pro (128,000 tokens)
59+
- Gemini 1.0 Pro (128,000 tokens)
5060

5161
Anthropic Models
5262
----------------
5363
- Claude Instant (100,000 tokens)
54-
- Claude 2 (200,000 tokens)
64+
- Claude 2 (9,000 tokens)
65+
- Claude 2.1 (200,000 tokens)
5566
- Claude 3 (200,000 tokens)
67+
- Claude 3.5 (200,000 tokens)
68+
- Claude 3 Opus (200,000 tokens)
69+
- Claude 3 Sonnet (200,000 tokens)
70+
- Claude 3 Haiku (200,000 tokens)
5671

5772
Mistral AI Models
5873
-----------------
59-
- Mistral Large (128,000 tokens)
74+
- Mistral Large Latest (128,000 tokens)
75+
- Open Mistral Nemo (128,000 tokens)
76+
- Codestral Latest (32,000 tokens)
6077
- Open Mistral 7B (32,000 tokens)
6178
- Open Mixtral 8x7B (32,000 tokens)
79+
- Open Mixtral 8x22B (64,000 tokens)
80+
- Open Codestral Mamba (256,000 tokens)
6281

63-
For a complete list of supported models and their token limits, please refer to the API documentation.
82+
Ollama Models
83+
-------------
84+
- Command-R (12,800 tokens)
85+
- CodeLlama (16,000 tokens)
86+
- DBRX (32,768 tokens)
87+
- DeepSeek Coder 33B (16,000 tokens)
88+
- Llama2 Series (4,096 tokens)
89+
- Llama3 Series (8,192-128,000 tokens)
90+
- Mistral Models (32,000-128,000 tokens)
91+
- Mixtral 8x22B Instruct (65,536 tokens)
92+
- Phi3 Series (12,800-128,000 tokens)
93+
- Qwen Series (32,000 tokens)
94+
95+
Hugging Face Models
96+
------------------
97+
- Grok-1 (8,192 tokens)
98+
- Meta Llama 3 Series (8,192 tokens)
99+
- Google Gemma Series (8,192 tokens)
100+
- Microsoft Phi Series (2,048-131,072 tokens)
101+
- GPT-2 Series (1,024 tokens)
102+
- DeepSeek V2 Series (131,072 tokens)
103+
104+
Bedrock Models
105+
-------------
106+
- Claude 3 Series (200,000 tokens)
107+
- Llama2 & Llama3 Series (4,096-8,192 tokens)
108+
- Mistral Series (32,768 tokens)
109+
- Titan Embed Text (8,000 tokens)
110+
- Cohere Embed (512 tokens)
111+
112+
Fireworks Models
113+
---------------
114+
- Llama V2 7B (4,096 tokens)
115+
- Mixtral 8x7B Instruct (4,096 tokens)
116+
- Llama 3.1 Series (131,072 tokens)
117+
- Mixtral MoE Series (65,536 tokens)
118+
119+
For a complete and up-to-date list of supported models and their token limits, please refer to the API documentation.
64120

65121
Understanding token limits is crucial for optimizing your scraping tasks. Larger token limits allow for processing more text in a single API call, which can be beneficial for scraping lengthy web pages or documents.
66122

@@ -139,3 +195,8 @@ Sponsors
139195
:width: 15%
140196
:alt: Stat Proxies
141197
:target: https://dashboard.statproxies.com/?refferal=scrapegraph
198+
199+
.. image:: ../../assets/scrapedo.png
200+
:width: 11%
201+
:alt: Scrapedo
202+
:target: https://scrape.do

docs/source/modules/scrapegraphai.helpers.models_tokens.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@ Example usage:
1919
print(f"GPT-4 token limit: {gpt4_limit}")
2020
2121
# Check the token limit for a specific model
22-
model_name = "gpt-3.5-turbo"
22+
model_name = "gpt-4o-mini"
2323
if model_name in models_tokens['openai']:
2424
print(f"{model_name} token limit: {models_tokens['openai'][model_name]}")
2525
else:

docs/source/scrapers/benchmarks.rst

-23
This file was deleted.

examples/ScrapegraphAI_cookbook.ipynb

+915
Large diffs are not rendered by default.

examples/anthropic/.env.example

-1
This file was deleted.

examples/anthropic/code_generator_graph_anthropic.py

-59
This file was deleted.

examples/anthropic/csv_scraper_anthropic.py

-56
This file was deleted.

examples/anthropic/csv_scraper_graph_multi_anthropic.py

-50
This file was deleted.

0 commit comments

Comments
 (0)