From 0e3a7cd6323b2d03a9a330ddbc37d294916394d2 Mon Sep 17 00:00:00 2001 From: Alex Ghiculescu Date: Sun, 5 Mar 2023 15:28:55 -0700 Subject: [PATCH] Add `text_splitters` https://github.com/ghiculescu/text_splitters contains a ruby port of https://langchain.readthedocs.io/en/latest/modules/indexes/examples/textsplitter.html (just one splitter so far, but more to come) --- README.md | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/README.md b/README.md index 9327c59..1f35591 100644 --- a/README.md +++ b/README.md @@ -39,6 +39,7 @@ A collection of Natural Language Processing (NLP) Ruby libraries, tools and soft * [Summarization](#summarization) * [Text Extraction](#text-extraction) * [Text Similarity](#text-similarity) +* [Text Splitting](#text-splitting) * [Text-to-Speech](#text-to-speech) * [Tokenizers](#tokenizers) * [Word Count](#word-count) @@ -404,6 +405,10 @@ Automatic summarization is the process of reducing a text document with a comput * [TF-IDF](https://github.com/reddavis/TF-IDF) - Term Frequency - Inverse Document Frequency in Ruby * [tf-idf-similarity](https://github.com/jpmckinney/tf-idf-similarity) - calculate the similarity between texts using tf*idf +## Text Splitting + +* [text_splitters](https://github.com/ghiculescu/text_splitters) - Port of Langchain text splitters + ## Text-to-Speech * [espeak-ruby](https://github.com/dejan/espeak-ruby) - small Ruby API for utilizing 'espeak' and 'lame' to create text-to-speech mp3 files