LegadoParser

阅读3.0书源规则解析库

支持大多数 Default Jsonpath Xpath Regex 规则

支持部分 Js 规则

支持部分特殊规则，如 {{ }}、{$. }、正则替换

支持 webView ，使用Selenium驱动，需要安装Chrome浏览器

更多细节请查看实现细节支持文档

Windows下需要Python 3.9版本，其他Python版本的部分依赖安装需要自行编译

安装

Windows （Python 3.9）

pip install git+https://github.com/821938089/LegadoParser#egg=LegadoParser

Linux/WSL （Python 3.8+）

sudo apt-get install libxml2 libxml2-dev
pip install git+https://github.com/821938089/LegadoParser#egg=LegadoParser --no-use-pep517

OCR字体识别可选安装

pip install git+https://github.com/821938089/LegadoParser#egg=LegadoParser[ocr]

安装后使用webView获取章节内容会自动检测是否需要OCR字体识别

局限性较大，无必要不推荐安装

升级

# 就是卸载重装
pip uninstall LegadoParser -y
pip install git+https://github.com/821938089/LegadoParser#egg=LegadoParser

卸载

pip uninstall LegadoParser -y

基础用法

详见 usage.py

高级API

from LegadoParser2.RuleCompile import compileBookSource

def compileBookSource(bookSource, specify=''):
"""
书源规则编译函数

参数 - 描述 - 类型

bookSource - 书源json - dict
specify - 只编译指定规则组 - str


specify 可选 ('ruleSearch', 'ruleBookInfo', 'ruleToc', 'ruleContent')

"""

from LegadoParser2.Search import search

def search(compiledBookSource, key, page=1):
"""
搜索函数

参数 - 描述 - 类型

compiledBookSource- 经过compileBookSource函数编译的书源规则 - dict
key - 搜索 - str
page - 页数 - int


注意：如果搜索后直接跳转到了详情页，将调用parseBookInfo获取信息。

"""

from LegadoParser2.BookInfo import getBookInfo

def getBookInfo(compiledBookSource, url, variables):
"""
获取详情信息

参数 - 描述 - 类型

compiledBookSource- 经过compileBookSource函数编译的书源规则 - dict
url - search函数中返回的 bookUrl 或 tocUrl - str
variables - search函数中返回的variables - dict

"""

from LegadoParser2.ChapterList import getChapterList

def getChapterList(compiledBookSource, url, variables):
"""
获取章节列表

参数 - 描述 - 类型

compiledBookSource- 经过compileBookSource函数编译的书源规则 - dict
url - getBookInfo函数中返回的tocUrl - str
variables - getBookInfo函数中返回的variables - dict

"""

from LegadoParser2.Chapter import getChapterContent

def getChapterContent(compiledBookSource, url, variables, nextChapterUrl=''):
"""
获取章节内容

参数 - 描述 - 类型

compiledBookSource- 经过compileBookSource函数编译的书源规则 - dict
url - getChapterList函数中返回的url - str
variables - getChapterList函数中返回的variables - dict
nextChapterUrl - 下一章的url - str

"""

from LegadoParser2.RuleEval import getElements, getString, getStrings
from LegadoParser2.RuleCompile import getRuleObj

# 基础API，根据规则提取数据
# 详细参数见源码

示例结果

--------------------开始搜索--------------------
{'author': '遥的海王琴',
 'bookUrl': 'https://www.zhaishuyuan.org/book/9/9256/',
 'coverUrl': 'https://img.zhaishuyuan.org/9/9256/9256s.jpg',
 'intro': '方瑾凌刚醒过来的时候，正好听到云阳侯将外室接了回来，据说私生子分外出息，欢欢喜喜地准备认祖归宗。府里上下都觉得云阳侯要舍弃活不长久的嫡子，培养庶子，等着看云阳侯夫人和方瑾
凌的笑话。然而云阳侯夫人却守在方瑾凌身边，放下一句：“凌儿，娘想和离。”云阳侯只道他的夫人只是一句狠话，温柔贤惠的性子哪儿敢真走。可没想到，春节未过，西陵侯府来人敲开了大门，一字排开 
的尚家小姐们恭请小姑姑和小表弟回家。看着这一二三',
 'kind': '连载,其他,55分钟前',
 'lastChapter': '第194章 谋逆蛊惑太子悖逆人伦55分钟前',
 'name': '我的江山，你随便捏',
 'wordCount': '98字'}
--------------------开始获取详情--------------------
{'author': '遥的海王琴',
 'coverUrl': 'https://img.zhaishuyuan.org/9/9256/9256s.jpg',
 'intro': '\u3000\u3000'
          '方瑾凌刚醒过来的时候，正好听到云阳侯将外室接了回来，据说私生子分外出息，欢欢喜喜地准备认祖归宗。府里上下都觉得云阳侯要舍弃活不长久的嫡子，培养庶子，等着看云阳侯夫人和方瑾
凌的笑话。然而云阳侯夫人却守在方瑾凌身边，放下一句：“凌儿，娘想和离。”云阳侯只道他的夫人只是一句狠话，温柔贤惠的性子哪儿敢真走。可没想到，春节未过，西陵侯府来人敲开了大门，一字排开 
的尚家小姐们恭请小姑姑和小表弟回家。看着这一二三',
 'kind': '其他,连载,2022-02-02',
 'lastChapter': '',
 'name': '我的江山，你随便捏',
 'tocUrl': 'https://www.zhaishuyuan.org/book/9/9256/',
 'wordCount': '98 万字'}
--------------------开始获取章节列表--------------------
列表大小193
[{'name': '第1章 寒冬 ',
  'url': 'https://www.zhaishuyuan.org/book/9256/7423287.html'},
 {'name': '第2章 苏醒 ',
  'url': 'https://www.zhaishuyuan.org/book/9256/7423288.html'},
 {'name': '第3章 做戏 ',
  'url': 'https://www.zhaishuyuan.org/book/9256/7423289.html'}]
--------------------开始获取内容--------------------
略
--------------------结束--------------------

Name		Name	Last commit message	Last commit date
Latest commit History 131 Commits
.github/workflows		.github/workflows
JsExtension		JsExtension
LegadoParser2		LegadoParser2
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
LICENSE_zh-CN		LICENSE_zh-CN
README.md		README.md
booksource.txt		booksource.txt
requirements.txt		requirements.txt
ruleDoc.md		ruleDoc.md
setup.py		setup.py
usage.py		usage.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LegadoParser

安装

Windows （Python 3.9）

Linux/WSL （Python 3.8+）

OCR字体识别可选安装

升级

卸载

基础用法

高级API

示例结果

License

About

Releases

Packages

Languages

License

821938089/LegadoParser

Folders and files

Latest commit

History

Repository files navigation

LegadoParser

安装

Windows （Python 3.9）

Linux/WSL （Python 3.8+）

OCR字体识别可选安装

升级

卸载

基础用法

高级API

示例结果

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages