How do to select specific date 'th' elements in an html file to webscrape in Python 3 using BeautifulSoup?

J

John Cook

Guest
I am just trying to to scrape dates off of this web page https://www.history.navy.mil/content/history/nhhc/research/histories/ship-histories/us-ship-force-levels.html#1886

the dates are in 'th' elements and I am trying to just get the ones that contain a date. I guess maybe by using regular expressions?

I know this is a very easy thing to however, I can't find any tutorials or forum questions that match what I am trying to do.

Any help would be super appreciated :D



import requests
from bs4 import BeautifulSoup
import re


r = requests.get('https://www.history.navy.mil/content/history/nhhc/research/histories/ship-histories/us-ship-force-levels.html#1886')


soup = BeautifulSoup(r.text,'html.parser')

for table in soup.find_all('table'):
for tr in table.find_all('tr'):
for th in tr.find_all('th'):
print(th.text)



This is a sample of my output so far:

DATE
12/86
12/87
12/88
12/89
12/90
12/91
BATTLESHIP
CRUISER*
MONITOR
TORPEDO BOATS
STEEL GUNBOATS**
AUXILIARIES
SCREW STEAMER***



I am basically just trying to grab these dates.

Continue reading...
 
Top