Scraper and parser for CHF/PLN pairs to excel format for most major banks in Poland
This scraping code is messy but produces 100% correct output. It can even parse pdfs from deutsche bank.
It requires some fiddling with the code sometime or running scripts in correct order. But the basic flow is:
- scrap data (json, html, xml, pdfs)
- parse data
Sometimes it's:
- scrap available times/dates
- scrap again using given times or dates
- process to build a data list/pandas dataframe
- convert to excel
If you lack some basic coding skills you will probably have trouble running it (hit me for ready excel spreadsheets or help, I did not had the reason to clean up the code)
Some code looks like async but it's not, it takes a while anyway to download all the data.
- create virtual environment for python
- install dependencies with
pip install -r requirements.txt
- enter folder of given bank.
- create directory named /dane
- adjust the dates you want to grab the data from in line:
dates = pd.date_range(start="2021-03-11", end="2022-10-03").to_pydatetime().tolist()
- to start scraping ->
python scrap.py
- after scraping all the data run
python process.py
(or similar, some banks are more complex to parse and require more scripts to run)