Dhrubajyoti Ghosh
Merging lecture notes
A friend of mine often wants to have lecture notes from various courses merged into a single PDF. The most recent was from this course page on elliptic curves.
To download the relevant files, create an empty directory and enter
wget -r -np -nd -A "*Slides*" https://math.mit.edu/classes/18.783/2022/lectures.html
(Note: I'm depending on the slide files to be named ...LectureSlides... else the regular expression *Slides* won't match)
For the interested, -r is --recursive, -np is --no-parent, -nd is --no-directories and -A is --accept.
We can't immediately use pdftk to merge the obtained files, because they won't be merged in the correct order.
To take care of this, do
find . -name "*.pdf" | sort -V >> sorted.txt
(Without the -V option in sort, filename_10.pdf will be placed before filename_1.pdf).
The contents of sorted.txt are thus
$ cat sorted.txt
./LectureSlides1.pdf
./LectureSlides4.pdf
./LectureSlides5.pdf
./LectureSlides6.pdf
...
./LectureSlides24.pdf
./LectureSlides25.pdf
To get the final PDF, do
pdftk $(cat sorted.txt) cat output slides.pdf
For fun,
I tried adding bookmarks to this PDF.
One can use sed to get the PDF names and pdftk to get the corresponding number of pages.
First do
pdftk slides.pdf dump_data output bookmark.txt
and then create a Bash script (alternatively enter the lines except for the first in your terminal).
Let's call it bookmark.sh.
#!/bin/bash
count=1
for f in $(cat sorted.txt)
do
TITLE=$(echo $f | sed 's/\.pdf//' | tail -c +3)
PAGES=$(pdftk $f dump_data | grep NumberOfPages | awk '{print $2}')
echo "BookmarkBegin" >> bookmark.txt
echo "BookmarkTitle: $TITLE" >> bookmark.txt
echo "BookmarkLevel: 1" >> bookmark.txt
echo "BookmarkPageNumber: $count" >> bookmark.txt
count=$(( $count + $PAGES ))
done
After making the script executable (chmod +x bookmark.sh), run it (./bookmark.sh).
The bookmark.txt file now looks like
...
BookmarkBegin
BookmarkTitle: LectureSlides1
BookmarkLevel: 1
BookmarkPageNumber: 1
BookmarkBegin
BookmarkTitle: LectureSlides4
BookmarkLevel: 1
BookmarkPageNumber: 45
BookmarkBegin
BookmarkTitle: LectureSlides5
BookmarkLevel: 1
BookmarkPageNumber: 59
...
To incorporate this into slides.pdf, do
pdftk slides.pdf update_info bookmark.txt output slides_bookmarked.pdf