Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No metadata for search results #84

Open
Fallenstedt opened this issue Jul 18, 2017 · 5 comments
Open

No metadata for search results #84

Fallenstedt opened this issue Jul 18, 2017 · 5 comments

Comments

@Fallenstedt
Copy link

Twitter's API allows you to receive metadata for search results.

Using ExTwitter.search I am able to search for tweets. As an example, I can search for 120 tweets about pizza near Portland with:

  def search(topic, count, radius) do
    options = [
      count: count,
      lang: "en",
      geocode: "45.5231,-122.6765,#{radius}mi",
      result_type: "recent"
    ]

    ExTwitter.search("#{topic}", options)
    |> IO.inspect
  end

Running this in console, we can see that the amount of tweets returned is 100 with a = MyModule.search("pizza", 120, 500) |> Enum.count

No where in this list is any search metadata that includes a next_results token for me to obtain the next twenty tweets. In the Twitter API, we should have meta data that might look like this

  "search_metadata": {
    "max_id": 250126199840518145,
    "since_id": 24012619984051000,
    "refresh_url": "?since_id=250126199840518145&q=%23freebandnames&result_type=mixed&include_entities=1",
    "next_results": "?max_id=249279667666817023&q=%23freebandnames&count=4&include_entities=1&result_type=mixed",
    "count": 4,
    "completed_in": 0.035,
    "since_id_str": "24012619984051000",
    "query": "%23freebandnames",
    "max_id_str": "250126199840518145"
  }

My suspicion is that when the results are parsed, we are excluding this metadata.. I've forked this library and tested searching without parsing, and I have access to this metadata.

Is there currently a way we can parse the json to include the search_metadata? If not, how can I contribute? I would love to have a feature that allows me to page through my data, because right now I am locked to only getting 100 results when I may need thousands.

@parroty
Copy link
Owner

parroty commented Jul 30, 2017

Thanks for the comment. As you indicated, currently metadata is excluded while parsing the results for search API.

I'm starting to wonder if certain meta-option can be added to allow access to the search_metadata and provide certain helper to fetch next page from the previous result.

If you have any opinions regarding interface, I appreciate if could share some (I'll be thinking some more).

prev_response = ExTwitter.search("pizza", [count: 100, search_metadata: true])
response = ExTwitter.search_next_page(prev_response.metadata)
defmodule Searcher do
  def search_next_page(prev_response, index) do
    IO.puts("Fetching page " <> to_string(index))
    response = ExTwitter.search_next_page(prev_response.metadata)
    if response != nil do
      prev_response.statuses ++ search_next_page(response, index + 1)
    else
      prev_response.statuses
    end
  end
end

response = ExTwitter.search("pizza", [count: 100, search_metadata: true])
tweets = Searcher.search_next_page(response, 1)

@Fallenstedt
Copy link
Author

I enjoy this interface idea. My design was going to include a struct for search_metadata and include it in response as the last item in that list.

Then Searcher.search_next_page could use this struct, which I would hope to be the last item in the response list, to fetch the next page of results.

One issue I have been having is using the search_metadata url to fetch the next page of data. I keep trying various ways to use it, however, I keep failing OAuth. I'll put together a more concrete idea later today and retrace my steps.

@gmile
Copy link
Contributor

gmile commented Aug 1, 2017

What about fetching other things that require paging?

For instance right now it's not possible not fetch beyond 200 items (this is per Twitter API design) when fetching favorited tweets:

length(ExTwitter.favorites(count: 1000))
# => 200

This means it's technically possible to fetch all favorites, but would require to do this manually.

@parroty would your proposed design cover this case as well, or is it only suited for search?

@parroty
Copy link
Owner

parroty commented Aug 20, 2017

@Fallenstedt Thanks for the comment and sorry being late to respond.

I enjoy this interface idea. My design was going to include a struct for search_metadata and include it in response as the last item in that list.

As current ExTwitter.search directly returns list of tweets, it gets tricky to add this metadata. I'm thinking to switch response type (list or struct) by search_metadata option. It might not be best way, but the following is a branch for fix trial.

#86

@gmile Thanks for the comment. If writing iterative logic (like the above example) is acceptable, I think the code like the following would correspond to the favorites case.
Search API depends on search_metadata for paging, but current ExTwitter.search is not returning the response (which is the original issue comment).

defmodule FavoritesSearcher do
  def run(options) do
    do_run(options, 1)
  end

  defp do_run(options, index) do
    favorites = ExTwitter.favorites(options)
    IO.puts("Fetched page " <> to_string(index) <> " with " <> to_string(Enum.count(favorites)) <> " tweets by max_id " <> to_string(Keyword.get(options, :max_id, nil)))
    if Enum.count(favorites) > 0 do
      favorites ++ do_run(Keyword.merge(options, [max_id: List.last(favorites).id - 1]), index + 1)
    else
      favorites
    end
  end
end

favorites = FavoritesSearcher.run(screen_name: "justinbieber", count: 200)

@gmile
Copy link
Contributor

gmile commented Aug 23, 2017

@parroty nice, thanks for the code excerpt! I'll try and rely my implementation on it.

Generally, do you think extwitter could benefit from including this excerpt, probably some generic form of it, to the core code?

I'm trying to think of cases where manually pulling additional items is appropriate. When a user would want to run his code inbetween API calls done by extwitter?

I think what really matters for end user, e.g. his intention is, to just get the N results he requested (be it 5, 200 or 1000) and not leverage a pull-check-pull-again mechanism by himself.

From looking at the reference, I see that different GET calls rely on different page_max value (100, 200, 800). I think all such calls could benefit from automatic pulling more results beyond page_max value.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants