Field data overhaul #662

azmeuk · 2020-10-06T17:40:44Z

I think there are some inconsistencies in the way data is handled by fields:

How things works now

Here is a small reminder on how inputs are handled in forms and fields.

input

There are several ways to put data into a field:

with request form data (Form.process() formdata argument);
with a placeholder object data (Form.process() obj argument);
with a placeholder data dict (Form.process() data argument);
with a field default value (Field default argument).

process

Then in Field.process():

Field.process_formdata() processes the form data. The input in always a list of strings. The form data is stored in Field.raw_data, then it may be coerced in a convenient python type, and is finally stored the result in Field.data;
Field.process_data() processes the object data value, or the data dict value, or the field default value (in that order). The input can have any type. The form data is stored in Field.object_data, then it may be coerced in a convenient python type, and is finally stored the result in Field.data.
In the end filters are successively applied, and may transform Field.data.

process_formdata() and process_data() may find errors and add them into Field.process_errors

validate

Validation consists in:

Field.pre_validate();
inline and extra validators;
Field.post_validate().

Generally pre_validate(), post_validate() and the validators handle Field.data, that is python data. In addition to the process errors validate() adds the validation errors into Field.errors.

output

Field._value() returns a value to display for the widgets. Generally it transforms Field.data in a htmlized form, but sometimes it takes shortcuts by returning the field formdata input (in Field.raw_data):
https://github.com/wtforms/wtforms/blob/9cc7c7f4d32c8e69c67460e8b8b2aa5ed411da2e/src/wtforms/fields/core.py#L707-L712

Problems

both `process_formdata()` and `process_data()` are executed

If the field has some formdata, then process_formdata() will overwrite self.data if it has been previously set by process_data(). It seems useless to execute process_data() at all then.

https://github.com/wtforms/wtforms/blob/9cc7c7f4d32c8e69c67460e8b8b2aa5ed411da2e/src/wtforms/fields/core.py#L342-L356

I suggest not executing process_data() if formdata is not None.

validation is always attempted

As discussed in #61, #360, #435, even if Field.process_data() or Field.process_formdata() raise a ValueError, and then Field.validate() is called, then validation on that field will be attempted.

I suggest that Field.validate() skips and always return False if Field.process_error is not empty.

validators handle one single value

In 2015, among the goals for v3 #154 tells us:

Distinguish fields that only use one value, use zero or one value, or use multiple values, and allow compatibility to be had based on some of these facets.

Some fields have multiple data (SelectMultipleField or MultipleFileField), most field have one unique value, but I cannot think of any field without value. Even submit buttons have a value.

Validators usually expect Field.data to be a single value, and sometimes (#442) that can cause issues. I can see two way to solve this:

Let the validators handle single or multiple values. I expect that handling multiple values will likely be about looping over single values. This can generate boilerplate.
Mark fields and validators:
- mark the fields as single or multiple. A single field (the default) means that it holds only one value, a multiple field means that it holds several values ;
- mark the validators as single or multiple. A single validators means it validates one single data, a multiple validator means it validates a set of data ;
- a unique validator on a unique field validates the field value ;
- a unique validator on a multiple field validates separately all the values ;
- a multiple validator on a multiple field validate the whole values at once ;
- a multiple validator on a single filed raises an error.

The latter implementation would not concern a lot of builtin validators, but might be useful for custom user-written validators.

I suggest implement the latter option. This brings flexibility for a cheap cost.

should `process_formdata()` and `process_data()` be flexible with the input?

As mentioned in #659, an IntegerField will silently accept and cast a float input. If the float is coming from the form, I think this should be refused by process_formdata(). But what if the float is coming from objdata? It feels pretty convenient to have silent casts in that situation.

I suggest accepting and documenting the policy: process_formdata() won't cast anything silently, but process_data() will do.

python input types are not always handled correctly

I said earlier that the input for process_data can have any type. However there are some cases that break those principles (as discussed in #609). For example, DateTimeField cannot validate datetime.datetime input data:

>>> import wtforms
>>> import datetime
>>> class F(wtforms.Form):
...     foo = wtforms.DateTimeField(
...         default=datetime.datetime.now(),
...     )
>>> F().validate()
Traceback (most recent call last):
...
TypeError: strptime() argument 1 must be str, not datetime.datetime

I suggest writing tests for every field, passing them python values as field default value, and make sure that the fields validate.

Field._value() makes false assumptions

Field._value directly returning the input formdata (when there is one) assumes that the HTML data input and the HTML data output will always be the same. But as the process step may transform the data, for example with filters, this assumption won't alway be true, and the data returned will be incorrect.

I suggest avoiding returning raw data in Field._value(), unless there has been errors processing the field.

The text was updated successfully, but these errors were encountered:

davidism · 2020-11-04T17:06:21Z

I suggest not executing process_data() if formdata is not None.

I agree. Either the form was submitted and only that data is processed, or the form is setting up defaults and only that data is processed.

I suggest that Field.validate() skips and always return False if Field.process_error is not empty.

I agree, but I think there's some related issues to address as well.

Validators can't trust the incoming data. It might be none, or it might be invalid according to a previous validator that didn't raise StopValidation. This also gets into the weird way required fields are implemented. To be entirely correct, if you pass any validators at all to a field, the list should always start with Required or Optional so that validators can assume they're actually getting data, but that's pretty annoying to have to remember to write out.
I don't see much discussion of StopValidation in Q&A and tutorials. The only use case I can think of for having validators that don't stop the chain is passwords, where there are multiple requirements that each have a unique error and should show all the missing requirements. Writing a chain of validators like that would be really verbose. Instead, I write one validator that does field.errors.append for each part.
InputRequired and DataRequired are confusing, and neither represents a completely good validation. Data should probably only fail if the value is None, but it's conflated with the weird behavior of boolean fields so it also fails if the value is False (which could be a legitimate value). Input doesn't make sense at all, why would you require input but allow that input to fail to process and return None, and yet it's the one we recommend over Data because of the issue with False.

I suggest implement the latter option [mark fields and validators]. This brings flexibility for a cheap cost.

I'll have to see what this looks like, but it sounds like a decent goal. Is the idea for every field to have a multiple parameter which changes the behavior of validate to call validators (and pre and post) on each value or all the values? How are widgets handled? What if I mark a string field as multiple?

I suggest accepting and documenting the policy: process_formdata() won't cast anything silently, but process_data() will do.

I'm not sure if this distinction is good. There is a similar disconnect in Click where input was mostly assumed to be all strings (from the command line), and types and defaults were handling already converted values inconsistently. There, we're moving towards having processing always work with strings or objects the same way, regardless of the source and type.

As to the specific int issue, Click handles this the same way, it converts an actual float to an int, but a string of a float would fail to parse as an int. While I can see an argument made that the value does not match the type and so shouldn't be valid, the current behavior is consistent with how Python's int() works. I don't think that PR should be merged.

I suggest writing tests for every field, passing them python values as field default value, and make sure that the fields validate.

I agree with this, but it seems to go against your previous suggestion. Maybe I'm misunderstanding.

I suggest avoiding returning raw data in Field._value(), unless there has been errors processing the field.

I think the current behavior of preferring raw_data is so that the value a user has entered in a form doesn't change from what they entered if the form is rendered again after failing validation (on potentially an unrelated field). How it's processed is an entirely server-side, internal thing. Besides user confusion, I could also imagine changing the value to some intermediate representation could cause subtle issues if validation fails halfway and results in a value that doesn't get processed the same way when submitted again.

azmeuk · 2020-11-04T21:50:42Z

Is the idea for every field to have a multiple parameter [...]?

Yes

I suggest writing tests for every field, passing them python values as field default value, and make sure that the fields validate.

I agree with this, but it seems to go against your previous suggestion. Maybe I'm misunderstanding.

Actually both points are related. In the latter I meant we should write some tests that pass already coerced data to process_data.

I suggest avoiding returning raw data in Field._value(), unless there has been errors processing the field.

I think the current behavior of preferring raw_data is so that the value a user has entered in a form doesn't change from what they entered if the form is rendered again after failing validation (on potentially an unrelated field). How it's processed is an entirely server-side, internal thing. Besides user confusion, I could also imagine changing the value to some intermediate representation could cause subtle issues if validation fails halfway and results in a value that doesn't get processed the same way when submitted again.

Right. To reformulate: I suggest avoiding returning raw data in Field._value(), unless there has been errors processing or validating the field.

azmeuk added question Question about WTForms behavior refactoring Does the same thing but in a different way labels Oct 6, 2020

azmeuk added this to the 3.0.0 milestone Oct 6, 2020

azmeuk pinned this issue Nov 5, 2020

davidism mentioned this issue Apr 8, 2021

BooleanField not populated by obj if empty formdata given #686

Closed

azmeuk mentioned this issue May 19, 2021

ETA for new release tags #691

Closed

azmeuk mentioned this issue Dec 23, 2021

Amend InputRequired validator to support multi-input fields #716

Closed

4 tasks

Hvass-Labs mentioned this issue Feb 19, 2023

Improving the documentation #772

Closed

This was referenced Jul 21, 2023

Only first item of a field list is validated #713

Open

Multiple fields support #799

Open

tspng mentioned this issue Feb 5, 2024

Cannot update field data on numeric fields when formdata is processed #834

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Field data overhaul #662

Field data overhaul #662

azmeuk commented Oct 6, 2020 •

edited

Loading

davidism commented Nov 4, 2020 •

edited

Loading

azmeuk commented Nov 4, 2020

Field data overhaul #662

Field data overhaul #662

Comments

azmeuk commented Oct 6, 2020 • edited Loading

How things works now

input

process

validate

output

Problems

both process_formdata() and process_data() are executed

validation is always attempted

validators handle one single value

should process_formdata() and process_data() be flexible with the input?

python input types are not always handled correctly

Field._value() makes false assumptions

davidism commented Nov 4, 2020 • edited Loading

azmeuk commented Nov 4, 2020

azmeuk commented Oct 6, 2020 •

edited

Loading

both `process_formdata()` and `process_data()` are executed

should `process_formdata()` and `process_data()` be flexible with the input?

davidism commented Nov 4, 2020 •

edited

Loading