15.xhtml 6.0 KB

1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677
  1. <?php
  2. /**
  3. * <https://y.st./>
  4. * Copyright © 2016 Alex Yst <mailto:copyright@y.st>
  5. *
  6. * This program is free software: you can redistribute it and/or modify
  7. * it under the terms of the GNU General Public License as published by
  8. * the Free Software Foundation, either version 3 of the License, or
  9. * (at your option) any later version.
  10. *
  11. * This program is distributed in the hope that it will be useful,
  12. * but WITHOUT ANY WARRANTY; without even the implied warranty of
  13. * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
  14. * GNU General Public License for more details.
  15. *
  16. * You should have received a copy of the GNU General Public License
  17. * along with this program. If not, see <https://www.gnu.org./licenses/>.
  18. **/
  19. $xhtml = array(
  20. 'title' => 'More specific spider output',
  21. 'body' => <<<END
  22. <p>
  23. I awoke this morning to find that <a href="/en/domains/newdawn.local.xhtml">newdawn</a> had frozen on me.
  24. With the spider rewritten in such a way that interuptions will not do lasting harm to the database, I have not even been trying to run it in such a way that newdawn going down allows the spider to continue running, despite it running from <a href="/en/domains/cepo.local.xhtml">cepo</a>.
  25. It has been crawling a particularly large website over the past couple days though, so now it will have to start crawling that site from the beginning.
  26. I hate not seeing how close to reaching a savable state that the spider is in.
  27. I previously had to remove the progress report feature as it was not compatible with the new MySQL database, but now, the spider is back to working partially from $a[RAM].
  28. I have added back the progress report feature, though it only reports the progress in relation to the current site, not the whole crawl.
  29. </p>
  30. <p>
  31. After that, I decided to add a new feature to the $a[cURL] download-limiting class, so once more, I got to work on building another wrapper class.
  32. I went through the <a href="https://secure.php.net/manual/en/resource.php">list of resource types</a> and found eighteen sections of the manual that mention functions in need of wrapping.
  33. Fourteen of these sections document standard $a[PHP] extensions while four document $a[PECL] extensions.
  34. I will start with the standard extensions before working on the $a[PECL] stuff.
  35. I wanted to work on the <a href="https://secure.php.net/manual/en/book.sem.php">semaphore, shared memory and $a[IPC]</a> functions today, as I wanted to see why the functions in this section of the manual have three three different prefixes.
  36. However <a href="https://secure.php.net/manual/en/function.ftok.php"><code>\\ftok()</code></a>, the one function in this section of the manual that had no prefix, depends on resources from the <a href="https://secure.php.net/manual/en/ref.shmop.php">shared memory</a> extension, so I worked on the wrapper class for those functions instead.
  37. After actually completing this tiny class, I realized that <code>\\ftok()</code> did not really need to be implemented in any class, let alone as a method that required another class.
  38. </p>
  39. <p>
  40. With that out of the way, I added a new feature to my curl_limit class to output information to the command line about the current progress of a download.
  41. The problem with this though is that it would output information whether it is used in a script that this is wanted or not, so I called it a debugging feature, made it optional, and turned it off by default.
  42. With it already being optional, I decided to make it optional in the spider too.
  43. I added a new configuration option, turned it on in the example configuration, and modified the existing output lines in the spider to respect the debug output setting.
  44. Finally, I moved the code that makes the spider work over $a[Tor] into its own constant so that it would be reusable.
  45. I made a few other minor adjustments to the spider as well, but nothing noteworthy.
  46. I have a few other somewhat important features that I want to implement in it, but I think that I will mostly put it aside for now and only fix any bugs I find in it.
  47. I want to get to work building some forum software; I have learned almost everything that the spider project had to teach me, though admittedly there is still a little left to learn from it.
  48. When the spider finishes its long crawl, perhaps that is when I will pick it back up again.
  49. </p>
  50. <p>
  51. We made plans to head to Eugene on Sunday or Monday, in order to work on moving what junk we still have in Springfield into a storage unit so we can work on getting that house on the market.
  52. The plan was tentative, as if there was anything that we could do to help with Cyrus&apos; Boy Scout project in that time, we would do that instead.
  53. However, tonight, Cyrus finally got his Boy Scout project approved! We are headed to the library, where he will be organizing some sort of organization effort.
  54. </p>
  55. <p>
  56. Normally, my mother does not want to hear anything about what I am up to.
  57. I have learned to be pretty quiet and not bring things up if I can help it.
  58. However, she actually asked me today, so I took a risk, and told her about the spider.
  59. To avoid having her check back on the progress of it and seeing something disappointing, I explained that given my lack of resources (my lack of disk space), I could not support a powerful spider or build a working search engine around it.
  60. She suggested that we pick up a new hard drive at the second-hand computer store when we are in Eugene! I do not think that she understood just how much hard drive space that I think that I need.
  61. I figure that I need at least a terabyte or two.
  62. Getting a hard drive that large will not be cheap, even second hand.
  63. Now that I think on it, I would probably also need a lot more $a[RAM] and a better processor, too.
  64. I simple upgrade will not work.
  65. I would need an entire replacement machine, and it would need to be a powerful one.
  66. </p>
  67. <p>
  68. I wrote back to my old school asking again how I can get my password reset.
  69. Hopefully they will actually respond this time.
  70. </p>
  71. <p>
  72. My <a href="/a/canary.txt">canary</a> still sings the tune of freedom and transparency.
  73. </p>
  74. END
  75. );