Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] - using hastag search seems cannot fetch all media data even using loop #1175

Open
zhangzyg opened this issue Jul 27, 2024 · 3 comments
Labels
bug Something isn't working

Comments

@zhangzyg
Copy link

I want to search ukraine related video in America region, but seems can only fetch 30-50 records. But checked in Tiktok, has 7.1M records, could we download all, or is there anyway to search by time range

My code snipet

async def search_videos_hashtag(hashtag, time_from, time_to, current_video_amount=0,
count=100, times=0) -> None:
global result, api, current_os, result_tik_id_set
format_style = '%m/%d/%y' if current_os == 'Windows' else '%Y/%m/%d'
sleep(random.Random().randint(a=3, b=5))
temp = 0
temp_video_amount = current_video_amount
if api is not None:
if len(api.sessions) == 0:
await api.create_sessions(ms_tokens=[ms_token], num_sessions=1, sleep_after=3, headless=False) #ms_token is None
async for searchRes in api.hashtag(hashtag).videos(count=count, cursor=current_video_amount):
temp += 1
current_video_amount += 1
time_to_add_one_day = int((
datetime.fromtimestamp(format_str_timestamp(time_to, format_style)) +
timedelta(days=1)).timestamp())
if format_str_timestamp(time_from, format_style) <= searchRes.as_dict['createTime'] <= time_to_add_one_day
and searchRes.id not in result_tik_id_set:
author = construct_author_metadata(searchRes)
publish = construct_publish_metadata(searchRes)
author.append_publish(publish)
result.append(author)
result_tik_id_set.add(searchRes.id)
print('append one tik tok data, current search: ' + str(current_video_amount))
if temp_video_amount == current_video_amount:
sleep(random.Random().randint(a=3, b=5))
video_urls = list(map(lambda res: res.publish[0].link, result))
for url in video_urls:
await search_related_videos(url, time_from, time_to, required_video_amount=count,
current_video_amount=0,
count=int(count / len(video_urls)))
if temp < count and times < 100:
await search_videos_hashtag(hashtag, time_from, time_to, current_video_amount,
count, times=times + 1)

@zhangzyg zhangzyg added the bug Something isn't working label Jul 27, 2024
@sameerahmedcls
Copy link

can you send your full code

@vagvalas
Copy link

vagvalas commented Aug 26, 2024

I can also confirm that this is a problem even before 6.4 (6.3.0) which could not pass beyond 45 videos.. now with 6.4 and later we can finally achieve a bigger amount (i had achieve 340 videos) but looping through the same videos again and again , and again (as the YouTube_dlp) which im passing the url fetched is constantly referring: already downloaded

here is my code:

from TikTokApi import TikTokApi
from yt_dlp import YoutubeDL
import asyncio
import os
from TikTokApi.exceptions import EmptyResponseException, TikTokException

ms_token = os.environ.get("multi_sids", "tjDG1O3i59WDpaK2v-spT5hmt1NcSJufT17v7cwvveTTqtYyq0N9mtAU-j76lfb7_msyycgSNt38AJVj2GF_KSxME27wc4C73eCVfSNsBs98TlO4PTOd2CEk7iRCm7kiFy7SPqKhUt33xvJ_LVtU")
ydl_opts = {
    'outtmpl': '%(uploader)s_%(id)s_%(timestamp)s.%(ext)s',
}

async def download_hashtag_videos(hashtag):
    async with TikTokApi() as api:
        try:
            await api.create_sessions(ms_tokens=[ms_token], num_sessions=1, sleep_after=3,
                                      headless=False, suppress_resource_load_types=["image", "media", "font", "stylesheet"])

            tag = api.hashtag(name=hashtag)
            more_videos = True
            while more_videos:
                videos = tag.videos(count=5000)
                video_list = []
                
                async for video in videos:
                    video_list.append(video)

                if not video_list:
                    more_videos = False
                    break

                for video in video_list:
                    print(f"Username: {video.author.username}")
                    print(f"Video ID: {video.id}")
                    print(f"Stats: {video.stats}")

                    video_url = f"https://www.tiktok.com/@{video.author.username}/video/{video.id}"
                    try:
                        with YoutubeDL(ydl_opts) as ydl:
                            ydl.download([video_url])
                    except Exception as e:
                        print(f"Error downloading video {video.id}: {e}")

        except EmptyResponseException as e:
            print(f"EmptyResponseException: {e}")
        except TikTokException as e:
            print(f"TikTokException: {e}")
        except Exception as e:
            print(f"An unexpected error occurred: {e}")

if __name__ == "__main__":
    hashtag = 'coldplayathens'
    asyncio.run(download_hashtag_videos(hashtag))    
   

TikTokApi: 6.5.2
Python 3.12
Playerlight: 1.39.00

@vagvalas
Copy link

Pass that it seems that it also fetched videos that it's not belong on the corresponding hashtag:
https://www.tiktok.com/@tashawishesyouluck/video/7407303973583015201

For example, and its not even on hashtag 'coldplayathens'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants