1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677 |
- <?php
- /**
- * <https://y.st./>
- * Copyright © 2016 Alex Yst <mailto:copyright@y.st>
- *
- * This program is free software: you can redistribute it and/or modify
- * it under the terms of the GNU General Public License as published by
- * the Free Software Foundation, either version 3 of the License, or
- * (at your option) any later version.
- *
- * This program is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
- * GNU General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with this program. If not, see <https://www.gnu.org./licenses/>.
- **/
- $xhtml = array(
- 'title' => 'More specific spider output',
- 'body' => <<<END
- <p>
- I awoke this morning to find that <a href="/en/domains/newdawn.local.xhtml">newdawn</a> had frozen on me.
- With the spider rewritten in such a way that interuptions will not do lasting harm to the database, I have not even been trying to run it in such a way that newdawn going down allows the spider to continue running, despite it running from <a href="/en/domains/cepo.local.xhtml">cepo</a>.
- It has been crawling a particularly large website over the past couple days though, so now it will have to start crawling that site from the beginning.
- I hate not seeing how close to reaching a savable state that the spider is in.
- I previously had to remove the progress report feature as it was not compatible with the new MySQL database, but now, the spider is back to working partially from $a[RAM].
- I have added back the progress report feature, though it only reports the progress in relation to the current site, not the whole crawl.
- </p>
- <p>
- After that, I decided to add a new feature to the $a[cURL] download-limiting class, so once more, I got to work on building another wrapper class.
- I went through the <a href="https://secure.php.net/manual/en/resource.php">list of resource types</a> and found eighteen sections of the manual that mention functions in need of wrapping.
- Fourteen of these sections document standard $a[PHP] extensions while four document $a[PECL] extensions.
- I will start with the standard extensions before working on the $a[PECL] stuff.
- I wanted to work on the <a href="https://secure.php.net/manual/en/book.sem.php">semaphore, shared memory and $a[IPC]</a> functions today, as I wanted to see why the functions in this section of the manual have three three different prefixes.
- However <a href="https://secure.php.net/manual/en/function.ftok.php"><code>\\ftok()</code></a>, the one function in this section of the manual that had no prefix, depends on resources from the <a href="https://secure.php.net/manual/en/ref.shmop.php">shared memory</a> extension, so I worked on the wrapper class for those functions instead.
- After actually completing this tiny class, I realized that <code>\\ftok()</code> did not really need to be implemented in any class, let alone as a method that required another class.
- </p>
- <p>
- With that out of the way, I added a new feature to my curl_limit class to output information to the command line about the current progress of a download.
- The problem with this though is that it would output information whether it is used in a script that this is wanted or not, so I called it a debugging feature, made it optional, and turned it off by default.
- With it already being optional, I decided to make it optional in the spider too.
- I added a new configuration option, turned it on in the example configuration, and modified the existing output lines in the spider to respect the debug output setting.
- Finally, I moved the code that makes the spider work over $a[Tor] into its own constant so that it would be reusable.
- I made a few other minor adjustments to the spider as well, but nothing noteworthy.
- I have a few other somewhat important features that I want to implement in it, but I think that I will mostly put it aside for now and only fix any bugs I find in it.
- I want to get to work building some forum software; I have learned almost everything that the spider project had to teach me, though admittedly there is still a little left to learn from it.
- When the spider finishes its long crawl, perhaps that is when I will pick it back up again.
- </p>
- <p>
- We made plans to head to Eugene on Sunday or Monday, in order to work on moving what junk we still have in Springfield into a storage unit so we can work on getting that house on the market.
- The plan was tentative, as if there was anything that we could do to help with Cyrus' Boy Scout project in that time, we would do that instead.
- However, tonight, Cyrus finally got his Boy Scout project approved! We are headed to the library, where he will be organizing some sort of organization effort.
- </p>
- <p>
- Normally, my mother does not want to hear anything about what I am up to.
- I have learned to be pretty quiet and not bring things up if I can help it.
- However, she actually asked me today, so I took a risk, and told her about the spider.
- To avoid having her check back on the progress of it and seeing something disappointing, I explained that given my lack of resources (my lack of disk space), I could not support a powerful spider or build a working search engine around it.
- She suggested that we pick up a new hard drive at the second-hand computer store when we are in Eugene! I do not think that she understood just how much hard drive space that I think that I need.
- I figure that I need at least a terabyte or two.
- Getting a hard drive that large will not be cheap, even second hand.
- Now that I think on it, I would probably also need a lot more $a[RAM] and a better processor, too.
- I simple upgrade will not work.
- I would need an entire replacement machine, and it would need to be a powerful one.
- </p>
- <p>
- I wrote back to my old school asking again how I can get my password reset.
- Hopefully they will actually respond this time.
- </p>
- <p>
- My <a href="/a/canary.txt">canary</a> still sings the tune of freedom and transparency.
- </p>
- END
- );
|