| filename | The original PDF filename from the ACM Digital Library |
|---|---|
| year | The year of publication, CSCW 2018 online-first edition of PACMHCI is 2017.5 |
| title_from_text | The paper title derived from the paper text, this may be incomplete or also include author names. |
| lead_author | The lead author of the paper, based on the filename from the ACM DL |
| num_pages | Number of pages in the PDF |
| total_words | Total number of words in the paper, defined as tokens between any contiguious space |
| total_words_nopunct | Total number of words after replacing all punctuation with spaces. |
| body_len_words | Number of words in the paper's front matter and body, no references and appendices. Calculated with `total_words` method. |
| body_len_words_nopunct | Number of words in the paper's front matter and body, no references and appendices. Calculated with `total_words_nopunct` method. |
| body_len_chars | Number of characters in the paper's front matter and body, no references and appendices. |
| ref_len_chars | Number of characters in the paper's reference section. |
| ref_len_words | Number of words in the paper's reference section. Calculated with `total_words` method. |
| ref_len_words_nopunct | Number of words in the paper's reference section. Calculated with `total_words_nopunct` method. |
| appx_len_chars | Number of characters in the paper's appendix section. Value is nan if no appendix was found. |
| appx_len_words | Number of words in the paper's appendix section. Calculated with `total_words` method. |
| appx_len_words_nopunct | Number of words in the paper's appendix section. Calculated with `total_words_nopunct` method. |
| ref_count_approx | Approximate number of references cited. |
| words_per_page | Averge number of words (`total_words` method) per page |
| words_nopunct_per_page | Average number of words (`total_words_nopunct` method) per page |
| chars_per_word | Average number of characters per word (`total_words` method) |
| chars_per_word_nopunct | Average number of characters per word (`total_words_nopunct` method) |
| body_words_nopunct_per_ref_count | Average number of words in the paper per number of references cited. |
| title_has_quote | 1 if title has a single or double quotation mark, 0 if not |