Iterating across a dictionary in Python

G

GBMedusa

Guest
I have several sites from which I want to get information with BeautifulSoup. The list of sites will change as time goes on. Also, some sites may not have relevant information at the time.

I fill in the urls of relevant sites and leave the others blank.

site1 = "http://www.some-site.com"
site2 = "http://www.another-site.com"
site3 = ""
site4 = "http://www.still-another-site.com"


I can manually get the "soup" from each url with these. I have except statements to handle the missing urls.

try:
source_site1 = requests.get(site1, headers=hdr).text
soup_site1 = BeautifulSoup(source_site1, "lxml")
try:
source_site2 = requests.get(site2, headers=hdr).text
soup_site2 = BeautifulSoup(source_site2, "lxml")
try:
source_site3 = requests.get(site3, headers=hdr).text
soup_site3 = BeautifulSoup(source_site3, "lxml")
try:
source_site4 = requests.get(site4, headers=hdr).text
soup_site4 = BeautifulSoup(source_site4, "lxml")


Is there a way to group the urls and iterate across the group to get the "soup" for each site without having to hard-code each one (like above)?

If not, is there another method that will give the same results (a separate "soup" variable for each site)?

I tried this but the "for" loop is stumping me.

site_list_dict = {
site1: "http://www.some-site1.com",
site2:"http://www.some-site2.com"
site3:""
site4:"http://www.some-site4.com"
}

# (This will not work because it will be overwritten with each pass. I would need a unique name for
# each "soup" variable, preferably assocated with the site key.)
for key,value in site_list_dict():
try:
source = requests.get(value, headers=hdr).text
soup = BeautifulSoup(source, "lxml")

Continue reading...
 
Top