1. Download HTML files by using:
wget -r -l1
2. Generate one big file:
cat /your/path/to/html/*.html > t.txt
3. The following methods just valid for content lines start with special_string and end with control_string
:
grep "special_string" t.txt > tt.txt
sed 's/special_string//g' tt.txt > ttt.txt
sed 's/control_string
//g' ttt.txt > your_file_name.txt