What's your actual paper sourcing workflow when PDFs aren't available? #199881
Replies: 1 comment
-
|
💬 Your Product Feedback Has Been Submitted 🎉 Thank you for taking the time to share your insights with us! Your feedback is invaluable as we build a better GitHub experience for all our users. Here's what you can expect moving forward ⏩
Where to look to see what's shipping 👀
What you can do in the meantime 💻
As a member of the GitHub community, your participation is essential. While we can't promise that every suggestion will be implemented, we want to emphasize that your feedback is instrumental in guiding our decisions and priorities. Thank you once again for your contribution to making GitHub even better! We're grateful for your ongoing support and collaboration in shaping the future of our platform. ⭐ |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Discussion Type
Product Feedback
Discussion Content
One thing paper-qa handles really well is the QA layer over documents you already have. What I've been trying to figure out is the step before that, actually getting the papers in the first place, especially for topics that go beyond ArXiv.
For context I was building a corpus covering biomedical and social science literature, neither of which is well represented on ArXiv. PubMed helps but the full text situation is inconsistent. Ended up trying scholarapi.net which pulls full text directly from open access sources across 30M+ papers, no PDF hunting, no broken links. Saved a lot of time on what is honestly the most boring part of the whole pipeline.
Curious what others are doing here. Are you mostly working with papers you already have locally, or have you found a reliable way to source at scale for domains outside the usual ArXiv/CS coverage?
Beta Was this translation helpful? Give feedback.
All reactions