
pokkoa baize yingzhao
An open-source project to simplify access to ancient Chinese divination datasets, offering various methods such as HTTP requests. 🚀 It is licensed under the MIT License and is free to use. 🎉 Any feedback is welcome! 💬
Repository Info
About This Server
An open-source project to simplify access to ancient Chinese divination datasets, offering various methods such as HTTP requests. 🚀 It is licensed under the MIT License and is free to use. 🎉 Any feedback is welcome! 💬
Model Context Protocol (MCP) - This server can be integrated with AI applications to provide additional context and capabilities, enabling enhanced AI interactions and functionality.
Documentation
Pokkoa Yingzhao
The Pokkoa Yingzhao Text Matching System is a Python-based tool designed to process Chinese text documents, vectorize their content using TF-IDF, and find the most relevant texts matching user queries. It leverages SQLite for storing vectors and provides a simple command-line interface for interactive queries.
!Project Image
Dataset
Ancient Chinese books txt files arelocated in the txt/ directory.
The txt directory contains:
- 49 volumes of Yi Jing (易经) texts, including various commentaries and interpretations
- 146 volumes of Shu Shu (术数) texts, including divination methods, fortune-telling, and geomancy
- Various other ancient Chinese texts covering topics such as:
- Divination systems (六壬, 奇门遁甲)
- Geomancy (风水) and burial practices
- Physiognomy (相术) and fortune-telling
- Astronomical and calendrical systems
- Military strategies and tactics
- Philosophical interpretations of the Yi Jing
These texts span multiple dynasties from pre-Qin period through the Qing dynasty, with authors including prominent figures like Zhu Xi (朱熹), Su Dongpo (苏东坡), Yang Xiong (杨雄), and many others.
Features
- TF-IDF Vectorization: Uses sklearn's
TfidfVectorizerto transform text into vectors. - Chinese Text Segmentation: Utilizes
jiebafor Chinese word segmentation. - SQLite Storage: Stores document vectors and the TF-IDF vectorizer in an SQLite database.
- Similarity Matching: Computes cosine similarity between query input and document vectors.
- Interactive CLI: Allows real-time querying and result display.
- Debug Mode: Offers detailed logging for processing steps.
- Support Http, gRPC, MCP(Model Context Protocol)
Stop Words
put under stopwords\stop_words.txt
stop_words from https://github.com/elephantnose/characters
Installation
Ensure you have Python installed (>= 3.8), then install the necessary dependencies:
pip install numpy jieba scikit-learn
Usage
- Initialize the system:
from text_matching import TextMatchingSystem
# Enable debug mode for detailed logging
system = TextMatchingSystem(debug=True)
- Build vectors from a directory of text files:
Ensure you have a directory (e.g., ./txt) containing .txt files.
system.build_vectors_from_directory('./txt')
- Find relevant texts for a query:
results = system.find_relevant_text('你的查询文本', top_n=3)
for result in results:
print(f"{result['filename']} (Similarity: {result['similarity']:.4f})")
print(result'text')
- Add new documents dynamically:
system.add_new_document('new_file.txt', '这是新的文档内容。')
- Get database statistics:
stats = system.get_database_stats()
print(stats)
Running the CLI
You can run the provided CLI by executing the following command:
python text_matching.py
Follow the prompts to build vectors, check database stats, and query texts interactively.
Database Structure
The SQLite database (text_vectors.db) contains:
vectorizertable: Stores the serialized TF-IDF vectorizer.document_vectorstable: Stores document content and their corresponding vectors.
Debugging
Enable debug mode for verbose logging by initializing the system with:
system = TextMatchingSystem(debug=True)
License
This project is licensed under the MIT License. Feel free to use and modify it.
For any questions or feature requests, please open an issue or reach out!
About Pokkoa
- Pokkoa website: pokkoa.com
- Linkedin: Pokkoa LinkedIn
- Hugging Face: Pokkoa on Hugging Face
- ✉️: contact@pokkoa.cc
Quick Start
Clone the repository
git clone https://github.com/jebberwocky/pokkoa-baize-yingzhaoInstall dependencies
cd pokkoa-baize-yingzhao
npm installFollow the documentation
Check the repository's README.md file for specific installation and usage instructions.
Repository Details
Recommended MCP Servers
Discord MCP
Enable AI assistants to seamlessly interact with Discord servers, channels, and messages.
Knit MCP
Connect AI agents to 200+ SaaS applications and automate workflows.
Apify MCP Server
Deploy and interact with Apify actors for web scraping and data extraction.
BrowserStack MCP
BrowserStack MCP Server for automated testing across multiple browsers.
Zapier MCP
A Zapier server that provides automation capabilities for various apps.