Hi Emacs community,

I’m an elisp noob, and I recently wrote a function to get the references on a wikipedia page. I plan on using it for org-mode/org-roam so I can do research faster (even though there’s probably already a package for that sort of thing). Unfortunately, it’s probably not as robust as I would like to think it is, as some of the dois/isbns appear to be missing in some wikipedia pages I’ve tested. Here it is for reference:

(defun get-wikipedia-references (subject)
  "Gets references for a wikipedia article"
  (let ((wikipedia-prefix-url "https://en.wikipedia.org/wiki/"))
	(url-retrieve-synchronously (concat wikipedia-prefix-url subject))
      (let* ((html-start (progn (goto-char (point-min))
				(re-search-forward "^$")))
	     (dom (libxml-parse-html-region (1+ (point)) (point-max)))
	(dolist (cite-tag (dom-by-tag dom 'cite) result)
	  (let ((cite-class (dom-attr cite-tag 'class)))
	    (cond ((string-search "journal" cite-class)
		   (let ((a-tag (dom-search cite-tag (lambda (tag) (string-prefix-p "https://doi.org" (dom-attr tag 'href))))))
		     (setq result (cons (cons (concat "doi:" (dom-text a-tag))
					      (let* ((cite-texts (dom-texts cite-tag))
						     (title-beg (1+ (string-search "\"" cite-texts)))
						     (title-end (string-search "\"" cite-texts (1+ title-beg))))
						(substring cite-texts title-beg title-end)
		  ((string-search "book" cite-class)
		   (let ((a-tag (dom-search cite-tag (lambda (tag) (string-prefix-p "/wiki/Special:BookSources" (dom-attr tag 'href))))))
		     (setq result (cons (cons (concat "isbn:" (dom-text (dom-child-by-tag a-tag 'bdi)))
					      (dom-text (dom-child-by-tag cite-tag 'i)))
		   (let ((a-tag (assoc 'a cite-tag)))
		     (setq result (cons (cons (dom-attr a-tag 'href) (dom-text a-tag)) result))))

(get-wikipedia-references "Graph_traversal")
(("doi:10.1109/SFCS.1979.34" . "Random walks, universal traversal sequences, and the complexity of maze problems")
 ("doi:10.1016/j.tcs.2015.11.017" . "Lower and upper competitive bounds for online directed graph exploration")
 ("doi:10.1016/j.tcs.2020.06.007" . "Online graph exploration on a restricted graph class: Optimal solutions for tadpole graphs")
 ("doi:10.1587/transinf.E92.D.1620" . "The Online Graph Exploration Problem on Restricted Graphs")
 ("doi:10.1016/j.tcs.2021.04.003" . "An improved lower bound for competitive graph exploration")
 ("doi:10.1137/0206041" . "An Analysis of Several Heuristics for the Traveling Salesman Problem"))

And yes, I know that I could probably use a library like s, dash, seq, or cl, but I try to keep my elisp functions free of those kind of things. I would appreciate any criticism from the Emacs community about my elisp!

  • nv-elisp@alien.top
    11 months ago

    You don’t have anything to guard against a bad response from the server. e.g.

    (unless (equal url-http-response-status 200)
      (error "Server responded with status: %S" url-http-response-status))

    To position point at the end of the headers:

    (goto-char url-http-end-of-headers)


    (setq result (cons (cons ...) result))

    Is more clearly expressed as:

    (push (cons ...) result)

    Better yet, you could map over the elements you’re interested in and accumulate the results via mapcar or cl-loop. That would obviate the need for the “results” variable.

    You could probably shorten things by using the dom-elements function to directly search for the href’s you’re interested in in combination with dom-parent to get at the parent elements.

    Overall your function gets a 65 out of 130 ERU (elisp rating units).