Dhrubajyoti Ghosh

Merging lecture notes

A friend of mine often wants to have lecture notes from various courses merged into a single PDF. The most recent was from this course page on elliptic curves.

To download the relevant files, create an empty directory and enter


wget -r -np -nd -A "*Slides*" https://math.mit.edu/classes/18.783/2022/lectures.html
            
(Note: I'm depending on the slide files to be named ...LectureSlides... else the regular expression *Slides* won't match)

For the interested, -r is --recursive, -np is --no-parent, -nd is --no-directories and -A is --accept.

We can't immediately use pdftk to merge the obtained files, because they won't be merged in the correct order. To take care of this, do


find . -name "*.pdf" | sort -V >> sorted.txt
            
(Without the -V option in sort, filename_10.pdf will be placed before filename_1.pdf).

The contents of sorted.txt are thus


$ cat sorted.txt
./LectureSlides1.pdf
./LectureSlides4.pdf
./LectureSlides5.pdf
./LectureSlides6.pdf
  ...
./LectureSlides24.pdf
./LectureSlides25.pdf
            

To get the final PDF, do


pdftk $(cat sorted.txt) cat output slides.pdf
            

For fun,

I tried adding bookmarks to this PDF. One can use sed to get the PDF names and pdftk to get the corresponding number of pages.

First do


pdftk slides.pdf dump_data output bookmark.txt
            
and then create a Bash script (alternatively enter the lines except for the first in your terminal). Let's call it bookmark.sh.

#!/bin/bash

count=1
for f in $(cat sorted.txt)
do
    TITLE=$(echo $f | sed 's/\.pdf//' | tail -c +3)
    PAGES=$(pdftk $f dump_data | grep NumberOfPages | awk '{print $2}')
    echo "BookmarkBegin" >> bookmark.txt
    echo "BookmarkTitle: $TITLE" >> bookmark.txt
    echo "BookmarkLevel: 1" >> bookmark.txt
    echo "BookmarkPageNumber: $count" >> bookmark.txt
    count=$(( $count + $PAGES ))
done
        
After making the script executable (chmod +x bookmark.sh), run it (./bookmark.sh). The bookmark.txt file now looks like

...
BookmarkBegin
BookmarkTitle: LectureSlides1
BookmarkLevel: 1
BookmarkPageNumber: 1
BookmarkBegin
BookmarkTitle: LectureSlides4
BookmarkLevel: 1
BookmarkPageNumber: 45
BookmarkBegin
BookmarkTitle: LectureSlides5
BookmarkLevel: 1
BookmarkPageNumber: 59
...
    

To incorporate this into slides.pdf, do


pdftk slides.pdf update_info bookmark.txt output slides_bookmarked.pdf