Groot for Large Data Sets #52

piercefreeman · 2015-09-28T05:35:58Z

I'm using Groot with a pretty large provided data set (~100,000 objects with relationships). Some entities only have ~100 objects but the larger ones have around 30,000. Right now it's taking a long time for the parsing to take place, which seems to be related to the -[NSManagedObject(Groot) grt_setRelationship:fromJSONDictionary:mergeChanges:error:]. Specifically in the existingObjectsWithJSONArray method for executeFetchRequest. Does anyone have suggestions to speed up this specific process, perhaps on the CoreData level?

aspcartman · 2015-11-30T11:13:36Z

For large datasets it's always recommended to do things the hard way: by hand. Universality of tools comes in price. Also you should consider using "background" contexts.

o15a3d4l11s2 · 2015-12-06T15:28:15Z

I am also interested in possible techniques for speeding up the persistence process. I tried resetting the context before persisting entities, but this did not affect the speed.

gonzalezreal · 2015-12-07T09:12:25Z

I think the performance problem resides on the structure of the data, rather on the amount of data. Of course this becomes more evident on large datasets.

One thing that affects performance when serializing from JSON is object uniquing, as it requires fetching data from the database before inserting.

If you take a look on how Groot is implemented, there are three serialization strategies:

Insert
Uniquing
Composite Uniquing

As you may guess, the first one is the most performant as it does not fetch from the database. If you know that there is no duplicate data in your data set, DO NOT set identityAttributes in your entity. This will make Groot use the Insert strategy.

Groot will pick the Uniquing strategy if the identityAttributes annotation has a single attribute, otherwise it will pick the Composite Uniquing strategy.

The Uniquing strategy requires one fetch for every array of JSON objects, whereas the Composite Uniquing strategy requires one fetch for every single JSON object (it is potentially the slowest of the three strategies).

I hope this sheds some light on the subject.

gonzalezreal added the question label Dec 7, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Groot for Large Data Sets #52

Groot for Large Data Sets #52

piercefreeman commented Sep 28, 2015

aspcartman commented Nov 30, 2015

o15a3d4l11s2 commented Dec 6, 2015

gonzalezreal commented Dec 7, 2015

Groot for Large Data Sets #52

Groot for Large Data Sets #52

Comments

piercefreeman commented Sep 28, 2015

aspcartman commented Nov 30, 2015

o15a3d4l11s2 commented Dec 6, 2015

gonzalezreal commented Dec 7, 2015