-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Pydantic typing #1104
Add Pydantic typing #1104
Conversation
d2a962d
to
3538dc5
Compare
Hello, I'm not well experienced with pydantic tbh, can you explain how this is useful or helps ease future maintainability? I know there is a lot of code duplication that this reduces but is there other benefits? |
We have a use case where we need to serialize the output Namedtuples do not natively allow nested deserialization. For example, using the following JSON: segments, info = model.transcribe(audio, word_timestamps=True)
out = [s_asdict() for s in segments]
print(json.dumps(out, indent=2)) [
{
"id": 1,
"seek": 2490,
"start": 0.0,
"end": 6.0,
"text": " Ladies and gentlemen, thank you for being here and for your written representations.",
"tokens": [
50364,
//...
],
"avg_logprob": -0.26516544380608725,
"compression_ratio": 1.597883597883598,
"no_speech_prob": 0.01361083984375,
"words": null,
"temperature": 0.0
},
//...
] If we try to convert this back to a from faster_whisper.transcribe import Segment
[Segment(**s) for s in parsed] Segment(
id=1,
seek=2490,
start=0.0,
end=4.62,
text=" Ladies and gentlemen, thank you for being here and for your written representations.",
tokens=[
50364,
# ...
],
avg_logprob=-0.26516544380608725,
compression_ratio=1.597883597883598,
no_speech_prob=0.01361083984375,
words=[
[0.0, 0.32, " Ladies", 0.310546875],
[0.32, 0.54, " and", 0.89453125],
# ...
],
temperature=0.0,
) Currently, we are recursing through the This should also fix #667. |
I see, but why not use |
I think dataclasses are not serializable to JSON, at least not without another library: import json
from dataclasses import dataclass
@dataclass
class Foo:
x: str
foo = Foo(x="bar")
json.dumps(foo)
# TypeError: Object of type Foo is not JSON serializable |
Closed in favour of #1105 |
Hi, we are using this library extensively at our company, and internally we have added type validation with Pydantic. If this is useful, I can update the documentation as well.