Retrieving CSV from webpage - Find source directory of files

M

M.O.

Guest
So I am creating a python code that will download information stored in csv files. My code works and the information is downloaded as

def Extract_Safe_Data(url, filename):
response = requests.get(url)
with open(filename+'.csv', 'w') as f:
writer=csv.writer(f)
for line in response.iter_lines():
row=line.replace(',', '.').split(';')
if row[2]!='':
writer.writerow([row[0], row[1], row[2]])
elif row[3]!='':
writer.writerow([row[0], row[1], row[3]])
return None


There is more than one type of data that I need to download so I have different functions that create the name of the url needed. An example of url is of the type

'https://website.com/cgi-bin/org/
apub.something.cgi?outtype=xcl
&macro=./%s/%s/cat/meas//filename.ic
&from=101130&to=%s
&path=/usr/dir/data/name/
&lang=esp&rsrc=
&macropath='
%(region, idd, date.today().strftime("%y%m%d"))


The way the URL changes is in the folders of the path cat, meas, which are the category and the measurement I want respectively for each id. Generally that is all that changes so it's pretty easy to create the URLs.

My problem is there are 10+ different measurements that I need to download, and for each I need to figure out the url pattern, by turning off Wifi, saying I want to download it from the website, copying that url for a few ids and figure out the pattern from there.

It's not horrible, and it is doable, but as for each id what I can retrieve differs, I am wondering if there is a way to inspect the source of where the CSVs are coming from and just download everything.

I tried copying and pasting part of the URL, until id and different variations of that, but it only told me the webpage doesn't exist. Trying "https://website.com/cgi-bin/org/" (not the real url) told me I dont have access to the server.

I am not familiar with the process that retrieves the information, which seems to be constructed by &macro= and &path and the similar kind you see. So I dont know how to modify the urls I have to retrieve more information, and/or hopefully find the source directory.

Is there a way to know which kind of files I can access with the little information I have? Or is it a futile attempt and need to do the long tedios process I have been doing so far?

Thank you if you've stuck reading me and I hope everything was clear enough to follow.

Continue reading...
 
Top