Dhrubajyoti Ghosh
Merging lecture notes
A friend of mine often wants to have lecture notes from various courses merged into a single PDF. The most recent was from this course page on elliptic curves.
To download the relevant files, create an empty directory and enter
wget -r -np -nd -A "*Slides*" https://math.mit.edu/classes/18.783/2022/lectures.html
(Note: I'm depending on the slide files to be named ...LectureSlides...
else the regular expression *Slides*
won't match)
For the interested, -r
is --recursive
, -np
is --no-parent
, -nd
is --no-directories
and -A
is --accept
.
We can't immediately use pdftk
to merge the obtained files, because they won't be merged in the correct order.
To take care of this, do
find . -name "*.pdf" | sort -V >> sorted.txt
(Without the -V
option in sort
, filename_10.pdf
will be placed before filename_1.pdf
).
The contents of sorted.txt
are thus
$ cat sorted.txt
./LectureSlides1.pdf
./LectureSlides4.pdf
./LectureSlides5.pdf
./LectureSlides6.pdf
...
./LectureSlides24.pdf
./LectureSlides25.pdf
To get the final PDF, do
pdftk $(cat sorted.txt) cat output slides.pdf
For fun,
I tried adding bookmarks to this PDF.
One can use sed
to get the PDF names and pdftk
to get the corresponding number of pages.
First do
pdftk slides.pdf dump_data output bookmark.txt
and then create a Bash script (alternatively enter the lines except for the first in your terminal).
Let's call it bookmark.sh
.
#!/bin/bash
count=1
for f in $(cat sorted.txt)
do
TITLE=$(echo $f | sed 's/\.pdf//' | tail -c +3)
PAGES=$(pdftk $f dump_data | grep NumberOfPages | awk '{print $2}')
echo "BookmarkBegin" >> bookmark.txt
echo "BookmarkTitle: $TITLE" >> bookmark.txt
echo "BookmarkLevel: 1" >> bookmark.txt
echo "BookmarkPageNumber: $count" >> bookmark.txt
count=$(( $count + $PAGES ))
done
After making the script executable (chmod +x bookmark.sh
), run it (./bookmark.sh
).
The bookmark.txt
file now looks like
...
BookmarkBegin
BookmarkTitle: LectureSlides1
BookmarkLevel: 1
BookmarkPageNumber: 1
BookmarkBegin
BookmarkTitle: LectureSlides4
BookmarkLevel: 1
BookmarkPageNumber: 45
BookmarkBegin
BookmarkTitle: LectureSlides5
BookmarkLevel: 1
BookmarkPageNumber: 59
...
To incorporate this into slides.pdf
, do
pdftk slides.pdf update_info bookmark.txt output slides_bookmarked.pdf