123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550551552553554555556557558559560561562563564565566567568569570571572573574575576577578579580581582583584585586587588589590591592593594595596597598599600601602603604605606607608609610611612613614615616617618619620621622623624625626627628629630631632633634635636637638639640641642643644645646647648649650651652653654655656657658659660661662663664665666667668669670671672673674675676677678679680681682683684685686687688689690691692693694695696697698699700701702703704705706707708709710711712713714715716717718719720721722723724725726727728729730731732733734735736737738739740741742743744745746747748749750751752753754755756757758759760761762763764765766767768769770771772773774775776777778779780781782783784785786787788789790791792793794795796797798799800801802803804805806807808809810811812813814815816817818819820821822823824825826827828829830831832833834835836837838839840841842843844845846847848849850851852853854855856857858859860861862863864865866867868869870871872873874875876877878879880881882883884885886887888889890891892893894895896897898899900901902903904905906907908909910911912913914915916917918919920921922923924925926927928929930931932933934935936937938939940941942943944945946947948949950951952953954955956957958959960961962963964965966967968969970971972973974975976977978979980981982983984985986987988989990991992993994995996997998999100010011002100310041005100610071008100910101011101210131014101510161017101810191020102110221023102410251026102710281029103010311032103310341035103610371038103910401041104210431044104510461047104810491050105110521053105410551056105710581059106010611062106310641065106610671068106910701071107210731074107510761077107810791080108110821083108410851086108710881089109010911092109310941095109610971098109911001101110211031104110511061107110811091110111111121113111411151116111711181119112011211122112311241125112611271128112911301131113211331134113511361137113811391140114111421143114411451146114711481149115011511152 |
- <?php
- /**
- * <https://y.st./>
- * Copyright © 2017 Alex Yst <mailto:copyright@y.st>
- *
- * This program is free software: you can redistribute it and/or modify
- * it under the terms of the GNU General Public License as published by
- * the Free Software Foundation, either version 3 of the License, or
- * (at your option) any later version.
- *
- * This program is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
- * GNU General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with this program. If not, see <https://www.gnu.org./licenses/>.
- **/
- $xhtml = array(
- 'title' => 'Learning Journal',
- 'subtitle' => 'CS 2203: Databases 1',
- 'copyright year' => '2017',
- 'body' => <<<END
- <h2 id="Unit1">Unit 1</h2>
- <p>
- It seems the learning journal assignment for this course is the same as the ones we had all term for <a href="https://y.st./en/coursework/CS1101/" title="Programming Fundamentals">CS 1101</a> and <a href="https://y.st./en/coursework/CS1102/" title="Programming 1">CS 1102</a>.
- That could be good or bad.
- I learn much better when I can write in-depth about what I'm doing.
- It helps me review what I've learned and internalise it.
- In <span title="Programming Fundamentals">CS 1101</span>, I was able to write as much as I needed to.
- However, in <span title="Programming 1">CS 1102</span>, I was docked points for verbosity, even when I tried to keep things short.
- Not only did I lose points, I wasn't even writing enough to actually help me learn.
- I'm hoping in this course, I can write as much as I need to like I could in <span title="Programming Fundamentals">CS 1101</span>.
- </p>
- <p>
- The learning guide claims that Oracle, IBM DB2, Sybase, Ingres, Informix, Microsoft SQL Server, Microsoft Access, Open Office Database, MySQL, Postgresql, Berkley DB, and SQLite all use a common language to manipulate and access data: $a[SQL].
- This is <strong>*extremely*</strong> misleading.
- They all use <strong>*similar*</strong> languages, all of which are dialects of $a[SQL], but if you try running a query written for one database software on another database software from the list, chances are, something will fail.
- The problem is that most, if not all (I'm not familiar with every database system on the list), of these $a[DBMS]s do not use the actual $a[SQL] standard, which makes interoperability a pain.
- It's fair to say these systems all use $a[SQL], even though they're not using it in its standardised form, but to say these systems all use the same language is quite untrue; they're all using different $a[SQL]s (different Structured Query Languages).
- These $a[SQL]s share many similar elements though, and there do exist some basic queries that'll run on multiple $a[DBMS]s.
- </p>
- <p>
- As I began reading Database Fundamentals, it talked about data stored graphically.
- At first, I thought it was storing data as graphics, as <strong>*images*</strong>.
- What a horribly inefficient way to store data!
- Extracting data would also be excruciating.
- It seems it's actually talking about storing information as <strong>*graphs*</strong>.
- I think even this isn't quite correct though.
- My guess is that this type of database stores information about the <strong>*points*</strong> on a graph without actually storing any graph.
- There's a huge difference, and as students of computer science, I like to think we should all be able to understand the difference in this course.
- </p>
- <p>
- The paragraph on why we need a $a[DBMS] instead of using flat files was informative.
- I've always been told flat files are bad because they're inefficient.
- I was told flat files are slow.
- However, it seems there's an even more important reason to use a $a[DBMS].
- A $a[DBMS] allows multiple writes to the same database to occur simultaneously without reverting changes made by other writes.
- With flat files, if one process reads the file in preparation to write, then another reads the file in preparation to write, we're left with a race condition.
- Whichever one actually writes first will have its write overwritten by the second process' write.
- What a pain.
- I think most $a[DBMS] software avoids this by keeping the database in $a[RAM], making changes there, and writing when need be, but I haven't actually looked into it much yet, so that could be wrong.
- </p>
- <p>
- The book discussed the possibility of multiple attributes making use of the same data domain.
- At first, this didn't make any sense to me.
- I mean, the actual concept made sense, but I didn't see any reason you'd set up a database this way.
- After all, why would you need to record the same characteristic twice?
- However, drawing options from the same data domain doesn't necessarily mean the attributes correspond to the same characteristic.
- For example, I could have a line of bicycles.
- My bikes are all of one colour, while having a pattern in a second colour.
- In a database representing these bikes, three of the attributes might be <code>colour0</code>, <code>colour1</code>, and <code>pattern</code>.
- <code>colour0</code> could be the base colour of the bike, while <code>colour1</code> could be the colour of the overlaid <code>pattern</code>.
- In this case, <code>colour0</code> and <code>colour1</code> would have the same data domain.
- They might be hex colour numbers, dye colour numbers, or human-readable colour names.
- However, they both represent slightly-different characteristics of the bikes.
- Having the same data domain doesn't make the two properties have the same meaning!
- </p>
- <p>
- It's interesting that a relation can have multiple unique keys (also known as "candidate keys").
- I wonder how useful that is in practice.
- I can think of several cases in which having a unique key composed of multiple attributes is useful.
- If I have a database of $a[URI]s broken down into their components for example, multiple $a[URI]s can contain identical schemes, userinfos, hosts, ports, paths, queries, and/or fragments - provided at least one of those seven components differs between $a[URI]s.
- However, when is it useful to have multiple parts of the records separately enforce record-specificness?
- I'd love to see an example of such a case, be it realistic or fabricated.
- I've heard about primary keys, and I knew what they were before this course, but I wasn't aware of the counterpart term, alternate keys.
- I didn't know there was a term for those.
- I was also aware of the use of surrogate/artificial keys, but likewise unaware that there was a term for them.
- I've likewise seen a lot of foreign key use, but I've only seen them used by tuples in one table to point to tuples in another.
- The book says they can also be used to point back at the same table that holds the foreign keys.
- I find that to be an incredibly strange concept.
- I mean, for completeness of the $a[DBMS], this <strong>*absolutely should*</strong> be a perfectly valid setup.
- Pointing back at the same table seems like a special case on the surface, but honestly, it shouldn't be in its implementation.
- However, when would it be useful to do this?
- I can think of a few hypothetical situations in which this would be necessary, but no actually-useful scenarios come to mind.
- </p>
- <p>
- The reading material in the second book was mostly the same as in the first book, but more brief.
- The second book also discusses the differences between $a[SQL] and languages such as Pascal, C, and $a[COBOL].
- It then continues that $a[SQL] cannot be used as a program itself because it's not a complete programming language.
- Um.
- What?
- That's not a fair assessment of $a[SQL] and is <strong>*incredibly*</strong> misleading.
- While $a[SQL] is very much a computer language, it's not a <strong>*programming*</strong> language at all!
- $a[SQL] is plenty complete, it's just not a language that describes algorithms, and thus, cannot build programs.
- Other examples of this concept can be found in the form of $a[XML], $a[CSS], and the language of maths.
- None of these languages can be used to build a program, but they're all complete and they all function for what they're actually supposed to do.
- (Actually, you might be able to build a program using the language of mathematics, I'm not sure on that one.
- It'd be difficult and time-consuming though.)
- </p>
- <p>
- I'm confused by the assignment's suggestion that we document our relations in the manner presented in the textbook.
- I don't remember there being any documented relations in there, and when I went back and skimmed-reviewed, I didn't notice any either.
- The only thing I could see was diagrams.
- This isn't art class, and pictures aren't documentation.
- Additionally, the images didn't say a whole lot.
- Actual paragraphs would say a lot more.
- </p>
- <p>
- I don't feel any interactions I had this week in this course were overly meaningful.
- We didn't have assignments to grade from last week, and this week's discussion assignment wasn't one that was open to actual discussion; it was just a reciting of definitions.
- Next week, we'll have assignments to grade though, so I'll have interactions from that to comment on.
- I'm curious to see how other students documented their relations.
- </p>
- <h3>Discussion post drafts</h3>
- <blockquote>
- <p>
- Please forgive my examples.
- Honestly, some of them are pretty lousy.
- I can give tones of examples of each concept in various computer-related fields, even those not related to databases, or even using other non-computer data concepts.
- However, using real-world objects for this sort of example is always a struggle, because quite frankly, many abstract data concepts don't relate to real-world objects hardly at all in my opinion.
- Even the examples given to us in the learning guide were a stretch.
- For example, the learning guide talked about a postal address as being like a tuple.
- However, a postal address is not at all a real-world object; it's purely data and abstraction!
- Names are always an abstraction to help us understand what is being referred to, be they names for people or streets; it's like variable names in a program pointing to a memory location.
- Even the numeric part of the address is abstraction, not a real-world object, and these numbers only refer to specific places on the street because we all agree they do.
- Data doesn't need to be stored on a computer to be data or be just an abstraction.
- Our ability to abstract like this without even noticing we're doing it is a huge part of why we humans are the most-developed animal species on the planet.
- </p>
- <p>
- A tuple is just a record. Pieces of data in the tuple all relate to one another.
- They are an ordered set that corresponds to how an entry appeared in the database at the time the query was performed.
- As a real-world object, a tuple is like an object; any object.
- Objects have properties, such as hardness, shape, colour, pattern, density, and size.
- An infinite number of properties could be added, depending on how detailed you want your description of the object to be.
- Other objects will share those same properties, but their properties will be set to different values.
- A tree and a pebble both have a size and a density, but the tree has a larger size and lower density than the pebble.
- </p>
- <p>
- A relation specifies an ordered set of data domains.
- The same data domain may appear multiple times in the list.
- The relation also contains a heading and a body.
- In our example, this would be the list of types of data to describe our real-world objects.
- If we describe our objects using their height in centimetres, colour, and density in grams, our relation's list of domains would be number, colour, and number.
- As we can see, the number data domain appears twice, but for each property, it has a very different meaning.
- Having two properties that fit into the same data domain doesn't mean those properties describe the same characteristic.
- </p>
- <p>
- A relation heading specifies a list of properties as long as the relation's domain list, and each property in the property list corresponds to the domain at that same position in the domain list.
- Property zero holds data from the set specified by domain zero, property one's data fits into domain one's set, and so on.
- As stated above, in our example, the relation heading's property list would be height in centimetres, colour, and density in grams.
- </p>
- <p>
- A relation body is a set of tuples.
- Each tuple consists of as many properties as there are domains in the relation's domain list and as many properties as is in the relation heading's property list.
- Each attribute/value pair in the tuple corresponds to the domain and property at the same position in the domain list and property list.
- Again, that means attribute/value pair zero from each tuple corresponds to domain zero and attribute zero, and so on.
- There can be a variable number of tuples in the body; it depends on the current state of the database.
- As more data is added, more tuples could appear in the list, while when data is removed form the database, tuples could disappear from the body.
- In our example, the relation body would be the set of all objects we describe.
- If a new object is created, for example in a factory, our relation body could grow.
- If we break something, our relation body might shrink.
- Alternatively, we might continue to store the broken object, but store it as multiple objects; one for each piece.
- In that case, our relation body would grow.
- </p>
- <p>
- The relation's relation degree is the number of attributes in the relation's heading.
- The more attributes are present, the higher the relation degree.
- As said before, a tuple is like a real-world object; <strong>*any*</strong> real-world object.
- However, your description of objects can contain as many or as few properties as you wish.
- A relation degree would, in this example, be like the level of detail you provide.
- The more qualities you include about objects, the higher the relation degree.
- </p>
- <p>
- The cardinality of a relation is just the number of tuples in that relation's body.
- That means that over time, as data is added and removed, the cardinality of a given relation can grow or shrink.
- Continuing our example, the cardinality is just the number of objects you describe.
- If our world has five hundred objects, our cardinality would be five hundred.
- </p>
- </blockquote>
- <blockquote>
- <p>
- I like your example, it was easy to follow.
- The visual representation was a nice touch as well.
- Knowing what each attribute is for is important for interpreting the data.
- In your example, <code>GrowTime</code> is an integer, but we must also know that this is the time in weeks that the plant has been grown before being sold, not, for example, the number of years the plant must grow to reach maturity.
- </p>
- </blockquote>
- <blockquote>
- <p>
- Yes, a relation can also be called a table.
- I wonder though why we don't <strong>*always*</strong> call it a table.
- The term "relation" seems a bit vague and inaccurate to me.
- A table can hold many rows of like data, but what does the term "relation" even imply?
- It seems to me that it implies a way to tie the data to other data, not a set of data that is similar in nature, but not actually related to other parts of the relation.
- For example, if we have a table of book information, none of the rows are related to one another at all, but they still compose the body of a "relation".
- On the other hand, a foreign key attribute actually <strong>*does*</strong> tie related information together, yet foreign key attributes are <strong>*not*</strong> called a relation.
- I can easily see why people new to databases could be confused by the terminology.
- </p>
- </blockquote>
- <blockquote>
- <p>
- If you read the discussion assignment again, you'll see is says "Your example may be of your own choosing but should illustrate each term in the context of a relation using a real-world object such as a car, book, course, university, etc.".
- There's no other way to take that than that it needs to be a physical object, as far as I can tell.
- </p>
- </blockquote>
- <blockquote>
- <p>
- It seems strange to me that the number of attributes in a relation is referred to as the degree of the relation.
- The term sounds like it refers to how closely-related the data is, but that's not even remotely how the term's used.
- Instead, it's just how much information the relation contains.
- </p>
- </blockquote>
- <h2 id="Unit2">Unit 2</h2>
- <p>
- I'm not sure I agree with the logic mentioned by the book as to why a primary key cannot be null.
- Null is the lack of a value, but sometimes, the lack of a value <strong>*is*</strong> the identifier.
- For example, imagine a table of values that are returned based on user input.
- If the user inputs nothing, the logical row to look up would be that with a primary key of null.
- Note that an empty strung and a lack of input aren't the same thing, and depending on the client software, may indicate very different things.
- However, the use of null foreign keys to signal a lack of a tie to the other table is useful.
- Making that possible though requires that null primary keys not be allowed.
- To put it simply, I agree that it's a good thing that primary keys cannot be used, I disagree with the book's logic on why this should be.
- </p>
- <p>
- The semantic integrity constraint seems very useful.
- I'm interested to see how one would implement this on the $a[DBMS] level.
- It seems like you'd need to define a function that determines whether a given value fits into the specified domain, but defining functions seems like it's outside the scope of what can be done in a $a[DBMS].
- This seems particularly true for a domain constraint.
- If the domain is the range of English colour names, the $a[DBMS] isn't going to know which strings qualify as colour names and which don't.
- It looks like the only forms of domain constraints possible might be format constraints and range constraints though.
- If this is true, the feature is limited and incomplete.
- </p>
- <p>
- Speaking of incompleteness and limitations, the <code>WITH DEFAULT</code> constraint seem incomplete as well.
- It doesn't work with, for example, strings.
- Why not?
- There's no reason not to allow a default string.
- It's also interesting that if two null fields are compared for equivalence in $a[SQL], they return unknown and not true.
- This really shows a difference between $a[SQL] and programming languages.
- </p>
- <p>
- Section 2.5 (the part before 2.5.1, anyway) makes relational calculus seem impractical.
- In theory, it'd be a lot easier for the human user to use than relational algebra, but it involves giving instructions to the computer without providing any way for the computer to actually carry out those instructions.
- Relational calculus seems ambiguous; like trying to speak plain English to the computer.
- This could be simply because I don't understand it though, and might not be a flaw in relational calculus at all.
- In Section 2.5.1 and 2.5.2, I couldn't understand hardly any of the material, so relational calculus is likely something I'll need to study more on my own when time allows.
- The provided examples didn't seem ambiguous at all, so there' a high probability I misinterpreted something somewhere along the way.
- </p>
- <p>
- My main interactions of the week were through grading last week's assignments.
- Sadly, one student had failed to understand what a degree of relation is.
- The second student brought up cardinality in their assignment.
- They couldn't tell the cardinality, as we didn't define any data to go into the database, but it was interesting to see "Cardinality: NA" and "Cardinality can not be defined" in their table definitions.
- It showed that they were thinking about cardnality, but knew what it was enough to know they couldn't apply it just yet.
- </p>
- <p>
- I kind of wonder what $a[DBMS] we'll use to implement the database we're planning.
- At least, I assume we'll eventually implement what we've been doing these couple weeks.
- I'm hoping we use a free software $a[DBMS]; that way, I'll be able to complete the assignment on my own machine.
- Otherwise, if it's proprietary software, I might have trouble finding a machine to run it on.
- </p>
- <h3>Discussion post drafts</h3>
- <blockquote>
- <p>
- The learning guide describes a union as a combination of two relations, and that the union itself, the information from one set or the other is returned.
- The phrasing is a bit misleading; it makes it sound like the union only contains data from one of the combined sets, but not the other.
- However, what it actually means is that any data that can be found in the first table or the second table will be returned in the union.
- It's worth noting too that the use of the word "or" here is like how it would be used in programming; that is to say, it's <strong>*not*</strong> "xor", but more like the English "and/or".
- If a piece of data exists in both sets, it well still be returned, and anything contained in at least one of the two sets will exist in the union of them.
- If a tuple exists identically in both tables, it'll only be returned in the result once.
- Otherwise, two tuples would share the same data, including the same primary key, which isn't possible.
- In a union, both relations to be combined must share the same attributes.
- </p>
- <p>
- In an outer join, two tables each have the value of an attribute in each tuple compared to a value in each of the other table's tuples.
- When there's a match, a new tuple formed from concatenating the tuples of the matching attributes will be returned in the result.
- For any tuple in one or both tables that doesn't have a match (depending on which type of outer join is used), the tuple will be concatenated with a null-filled tuple in the return result instead.
- On the other hand, an inner join contains only tuples in which there were in fact a match of attributes.
- Null-filled tuples aren't added, and any tuple without a match is simply not returned in the resulting relation.
- </p>
- <p>
- The main things these three processes have in common is that they take data from two relations, process it in some way, and generate a new relation.
- This new relation is typically not saved to the database, but is instead returned as a query result.
- Instead of asking the $a[DBMS] to return the existing tables and having the requesting application process them into the desired format, the requesting application just asks the $a[DBMS] to process the data.
- It allows for simpler program logic and probably better efficiency.
- The union is the simplest of the three, almost concatenating two tables together, but omitting duplicate entries.
- An inner join returns results basically indicating how two tables relate to one another.
- The outer join is just like the inner join, but can return data from one or both relating tables, even when that data doesn't have a mate in the other table.
- </p>
- </blockquote>
- <blockquote>
- <p>
- I like the way you phrased what outer and inner joins do.
- Inner joins do focus on commonality, while outer joins return both the common and uncommon records.
- </p>
- </blockquote>
- <blockquote>
- <p>
- A union is almost like a concatenation of the two tables, but not quite.
- There's one major difference.
- Two tuples in the table can't have the same primary key, so any time a tuple from one input table matches a tuple from the other input table, one of the two tuples will not exist in the output table, while the other will.
- </p>
- </blockquote>
- <blockquote>
- <p>
- I hadn't thought about it, but it makes sense that an inner join would be the most common type of join.
- Usually, you're trying to find the information that's there, not the information that's not.
- With a join, it's common that one table serves as a base, a table that doesn't change often.
- The other table has ever-changing data that we're interested in, but that data is more meaningful when given context from the more-stable table.
- We can see this in the form of the books lent table in our resent assignments.
- Knowing what books are checked out is useful, but it's even more useful if we're able to match those books to the library member that checked them out and see all that data together.
- </p>
- </blockquote>
- <h2 id="Unit3">Unit 3</h2>
- <p>
- When grading the assignments last week, we were required to grade partially based on whether the student explained the purpose of each constraint placed on the data.
- This doesn't seem fair, as we were never asked to do that.
- We were told to decide what constraints were needed, not explain <strong>*why*</strong> they were needed.
- </p>
- <p>
- I'm confused about entities in entity relationship modelling.
- It looks to me like the entities are just the relations (tables).
- So why do we use a different term in entity relationship modelling?
- My best guess is that I missed something, and they're not representations of the relations at all.
- But if they're not, why do they have the same attributes?
- </p>
- <p>
- The other book cleared up the confusion.
- "Entities" in the diagrams aren't actually entities, but entity sets.
- Entity sets seem exactly like tables, upping our number of terms that refer to these same things up to three (relations, tables, and entity sets).
- Entities also have several names (entity, tuple, row).
- It seems the term "cardinality" is used in a different sense here too, to refer to the number of possible relationship sets exist between entity sets.
- Why can't each term refer to a single concept?
- That'd make things less ambiguous and less confusing.
- </p>
- <p>
- I like that there's a way to mark fields as optional in the diagram.
- As an example, this worked out very well for the assignment this week.
- We needed a telephone number fields for library members, but not everyone has telephone service.
- That field <strong>*must*</strong> be optional; otherwise, we're going to need to implement some sort of workaround in the future for when dealing with members that don't choose to use a telephone.
- </p>
- <p>
- I have no idea why, but it took me way too long to figure out the symbolic lines that mark the relationships between entities.
- I had to stare at the diagrams and read what they were supposed to mean numerous times.
- Part of it was that I spent the week in emotional distress, so it messed with my ability to properly take in information, but there's also the fact that even under ordinary circumstances, I handle information in text form much better than in graphical form.
- </p>
- <p>
- The other book mentions the fact that data should only appear in a given database once.
- Very true!
- When redundant info shows up in the database, it makes updating the database a pain.
- You've got to update the same thing multiple times, once for each occurrence.
- It also takes up more storage space to save the same information in multiple places.
- And at the end of it, what have you gained by having duplicates?
- </p>
- <p>
- This week was a bit overwhelming.
- The course content was fine, but life issues got in the way of my fully absorbing it.
- I ended up having to complete the assignment in a rush.
- I might have to come back and reexamine the material from this week, but as I'll be trying to keep up with new units as well, that could take time.
- I was very happy to see we were asked to use Dia this week.
- Alternatives were offered, but Dia is a tool I got familiar with in <span title="Programming Fundamentals">CS 1101</span>.
- It was a tool I didn't need to spend a great amount of time figuring out how to use, and it's free software available in the Debian repository.
- I did have to take some time to figure out how to get the right connector lines to draw though, as we only used basic arrows in <span title="Programming Fundamentals">CS 1101</span>.
- </p>
- <h3>Discussion post drafts</h3>
- <blockquote>
- <p>
- The page you linked to resolves to a 404 error.
- </p>
- <p>
- A many-to-many relationship cannot exist because each entity in the relationship must exist before the other.
- That is, unless the relationship isn't actually dealt with in the setup of the database.
- For example, if a car can have several drivers and a driver may drive several cars, the two fields don't need to be linked.
- A join query can still be run, linking the data on an as-needed basis.
- Another option is to employ a third table, used exclusively for joining the two main tables.
- If one table contains drivers and another contains vehicles, the third table can have three fields: it's own primary key, the foreign key of the drivers table, and the foreign key of the vehicles table.
- We could even remove the primary key field and use the two foreign key fields combined as the primary key.
- With this third table, the drivers and the cars need to exist before the relationship in the third table is set up, but the two don't need to exist before each other.
- </p>
- </blockquote>
- <blockquote>
- <p>
- I'm not sure what "ER" is, sorry.
- </p>
- <p>
- You're probably right about the join query.
- I was thinking if there were three rows in one table that matched five rows in another, the join query would multiply them and make fifteen rows.
- This is probably wrong though.
- </p>
- </blockquote>
- <blockquote>
- <p>
- I like your example of this being a chicken and egg situation.
- Each must come before the other, but each <strong>*can't*</strong> come before the other.
- I want to say there's a way to do it.
- I want to say that as long as the foreign key fields could ne null in the first entity to begin with, the second entity could be set up, then the first entity corrected.
- However, there's probably some technical reason why this can't happen.
- </p>
- </blockquote>
- <blockquote>
- <p>
- I had that same idea of using a third table (I think most of us did), but I didn't have a term for it.
- I think your term "bridge table" describes the concept perfectly!
- By using the primary keys of the two base tables as foreign keys in the bridge table and using both foreign keys combined as the primary key, you ensure that it's easy to look up the link between the entities you need linked and that each entity cannot be linked to the same other entity more than once, though it can still link to other entities as it needs to.
- </p>
- </blockquote>
- <blockquote>
- <p>
- The textbook said something about converting the many-to-many relationship into two one-to-many relationships, but I couldn't wrap my head around what that meant.
- Reading your post made it click for me.
- Thank you!
- It looks like I reached the correct solution before when I set up the third table, I just didn't have the right words to describe why I'd done that and what exactly it accomplished.
- </p>
- </blockquote>
- <h2 id="Unit4">Unit 4</h2>
- <p>
- Unfortunately, my laptop died on me at the beginning of the week.
- I lost the first day of the week due to needing to catch up with last week.
- The next day, my laptop died on me before I could get any coursework done, but I didn't have time to go get a replacement because I had a shift at work.
- The next day, I replaced the machine with one from the recycling centre before work and downloaded the operating system.
- And on the next day, I finished the hardware swapping (the new machine didn't have a working Wi-Fi card), installing the new operating system, backing up my data from the old machine (the old hard drive was still intact), transferring it to the new machine, and installing my usual software bundle (including software needed for school, such as KeePassX and Dia).
- That left me with three days left to cram a week's worth of studying into, and two of those days were work days for me.
- I managed to get one of those shifts covered, but not the other.
- Needless to say, this wasn't a week of heavy reflection on the material, and was instead a week of frantic cramming at the last minute.
- My new machine is slow, but at least it runs my familiar Debian environment, so I didn't have to struggle to learn a new system while simultaneously struggling to catch up in my studies.
- </p>
- <p>
- The main topics for this week were redundancy removal and normalisation.
- I was excited to see a discussion of both these topics, though I quickly saw that they were the same thing.
- Database normalisation is the removal of redundant data; I mistakenly thought we were going to additionally cover data normalisation, the art of converting equivalent data into a singular representation.
- Data normalisation is something I get kind of geeky about, and it's very useful for keeping a clean database.
- </p>
- <p>
- I hadn't thought about it before, but the textbook brings up a good point.
- When unrelated data is stored in a table together, instead of in separate tables, it makes adding data to the table a bit awkward.
- Adding information in one area requires adding the information of all fields of that tuple, so this requires adding unrelated information just to complete the set.
- Deleting information can also be detrimental, as unrelated information can be lost as well.
- The names for these oddities are insertion and deletion anomalies, respectively.
- The book also mentions update anomalies, which are when one instance of the data is updated while another remains unchanged.
- This creates conflicting data in the database.
- I feel like this is what people normally aim to address when they try to remove redundancy from a database.
- I actually had to deal with this issue between this term and last.
- I had a legal change of name recently, as I've never liked my birth name.
- I asked University of the People to update my name in their records, and my name was updated in the student portal, while it remained unchanged in Moodle.
- I thought I was going to have to interact with students and professors this term under my old name!
- Thankfully, the issue was fixed in time, but it's a good real-world example of data redundancy and an update anomaly: student names here at the university are stored in at least two places, but there's no reason for them to be.
- </p>
- <p>
- The section on functional dependency was thoroughly confusing.
- I couldn't understand what it was talking about, and the provided example was no help at all.
- It was only after the book started talking about how A implying D doesn't mean that D implies A that things started to make sense.
- I think it's saying that for every instance of A and D, if A matches, so does D.
- If that's indeed what it's talking about, it would be the basis of relation decomposition.
- Not every implication makes the basis of a good place to decompose the table, but these implications are necessary for preventing data loss.
- Armstrong's Axioms seem to back up my thoughts as to what property is being discussed too, so I think I know what's going on.
- </p>
- <p>
- When grading the assignment submissions from last week, I found that we're being graded on whether or not we specified what type each attribute is to be.
- The assignment directions didn't tell us to do that though!
- I had that information, as I needed to decide on it for assignments prior, but I didn't include it in the relational diagram because the book on relational diagrams didn't include that information in any of its examples or instructions.
- We students had no way to know that we were supposed to include it in our diagrams; I get the feeling most of us will lose points on that.
- </p>
- <p>
- I understand the first four normal forms, and they all seem very useful in eliminating redundancy.
- However, I don't understand what Boyce-Codd Normal Form is or how it works.
- I should look into this more when time allows.
- There are likely other resources that would explain it in a way I can better understand.
- I'm finding a lot of the material in the book is explained in ways that I just don't seem to get.
- I'm not sure if it's because databases aren't my strong suit or if the explanations are just very obtuse.
- It doesn't help either that abbreviations are used heavily by the book.
- If the terms were spelled out, I could do a pretty-good read-through.
- However, I have to instead keep going back and looking up what certain abbreviations even mean.
- I think I understand what a functional dependency is.
- On the other hand, I have no clue what an "FD" is, and have to keep going back and skimming to find the section that actually spells out the term.
- I know the terms, just not the shorthand.
- </p>
- <h3>Discussion post drafts</h3>
- <blockquote>
- <p>
- The first normal form, strictly speaking, is what is required to build a relational database.
- Any data that doesn't fit the minimal requirements needed to be in the first normal form cannot be stored in a relational database.
- The data must either be cleaned up or a different database type must be used.
- The first normal form dictates that every field's domain be atomic and indivisible.
- Simply put, this means no table cell can contain multiple pieces of information.
- Each piece of data stored in a tuple's field must be singular, not composite.
- Additionally, rows and columns in the first normalised form have no specified order, table cells contain only one value (which must be in that column's domain), rows do not have hidden data, and there are no duplicate rows.
- All of these rules are enforced by the $a[DBMS].
- Decomposition can be used to break off fields that aren't able to comply with this form.
- For example, if a field needs to contain multiple values if a part of the main table, it can be broken off into another table, using the primary key of the initial table as a foreign key to relate the two tables together.
- </p>
- <p>
- In the second normalised form, all the requirements of the first normalised form must be met, with the additional requirement that if the primary key is composed of multiple fields, no other field may depend on one field of the primary key without depending on all of them.
- If a given field depends on only part of the primary key, we're going to see duplicates throughout our relation whenever that partial primary key matches.
- For that reason, any such fields must be broken off into their own relation.
- Decomposition is used to break these indirectly-dependent fields off into a smaller table, this time using the field they depend on (which is part of but not the entire primary key) as the foreign key in the new table.
- </p>
- <p>
- In the third normalised form, all conditions of the second (and thus first) normalised form must be met.
- In addition to those requirements, the data is even further refined.
- No field may depend on a non-key field.
- If it did, it would cause redundancy in cases where multiple key field values can lead to the same other field value.
- For example, if K is our key field, A depends on K, and B depends on A, then if two values of K produce the same value of A, those same two values of K necessarily must produce the same value for B as well.
- This is not allowed by the third normalised form, as it allows the redundant B values.
- Instead, B (and any other such indirectly-dependant values) must be broken off into their own relation.
- Decomposition to reach this normalised form works the same way as for the second normalisation, except that the foreign key won't be a part of the main table's primary key.
- </p>
- </blockquote>
- <blockquote>
- <p>
- You make a good point.
- Not only is data loss an issue, data <strong>*integrity*</strong> is at stake.
- If one copy of the data is updated, a different copy might still be queried.
- Even worse and more likely, different copies of the data will be updated and queried at different times, resulting in a total hodgepodge of values where only one should be present.
- </p>
- </blockquote>
- <blockquote>
- <p>
- The second normal form doesn't require that a relation have only a single attribute as its key.
- It only requires that if the key is made up of multiple attributes, no attribute depends on part of the key without depending on the entire key.
- As you said in your post, in the second normal form, nothing can depend on only part of the key, and depending on only part of the key is only possible when the key is made up of multiple fields.
- As a result, having both requirements wouldn't make sense and would be a bit redundant.
- </p>
- </blockquote>
- <blockquote>
- <p>
- I actually encountered that type of situation recently, in which one value in a database got updated and another copy of the value didn't.
- I had a legal change of name between terms (I've never liked my birth name), and I asked University of the People to update my name in their records.
- They ended up updating my name in the student portal, but not in Moodle.
- Our student names exist in at least two places in the University of the People database!
- I feared I'd have to interact with students and professors this term under my birth name, but thankfully, I got the issue cleared up in time.
- </p>
- <p>
- Slower query responses can also be a problem, depending on the situation.
- Weak hardware or heavy use of the database will amplify the issue.
- </p>
- </blockquote>
- <h2 id="Unit5">Unit 5</h2>
- <p>
- We seem to be focussing on implementation of databases this week.
- Yay!
- Laying down an good foundation with learning how to properly plan a database was important.
- It's a good thing we didn't jump straight to working with the database software.
- However, I've been excited to learn how to properly use a $a[DBMS], so I'm glad we're at that point now.
- I've used MySQL in the past, but I can't say I'm very good at it.
- Maybe now with some actual training, I can do better.
- </p>
- <p>
- The textbook finally admitted that $a[DBMS] vendors implement $a[SQL] differently, resulting in a need to write $a[SQL] in a $a[DBMS]-specific way.
- Previously, the book claimed that many $a[DBMS]s spoke the same language, $a[SQL], but these different $a[SQL] dialects mean that the $a[DBMS]s are effectively speaking different languages.
- There's an official $a[SQL] standard, sure, but because it's not implemented by any $a[DBMS], it's not actually a common language for database software to work with.
- </p>
- <p>
- I knew that an automatically-incrementing integer field could be put in place, but I didn't realise it could start anywhere but zero.
- In the example provided, the integer started at five hundred, then incremented by one.
- The fact that the incrementation is by one was specified too, which means it should be possible to increment by other integers as well.
- Again, I had no idea.
- I thought that incrementing by one was the only option.
- </p>
- <p>
- I've always thought it the job of the application software to sanitise and validate values before attempting to put them in a database.
- It makes sense to do the validation instead in the $a[DBMS] when possible though.
- Sanitising still has to be done on the application side though to prevent the passing of dangerous strings containing arbitrary commands to the $a[DBMS].
- Additionally, I get the feeling that more-complex forms of data validation can be done from the application side.
- For example, can a $a[DBMS] constraint ensure that a $a[URI] is properly formatted and normalised in accordance with $a[RFC] 3986?
- I don't think that it can.
- However, it <strong>*can*</strong> be done on the application side, and when I built a Web crawler, I implemented such a thing that ran on all $a[URI]s before storing them in the database.
- </p>
- <p>
- The part on views was interesting.
- They could even be used to abstract the tables a bit, if needed.
- For example, if the database ever needs to be restructured (say, the database had been poorly-designed to begin with, and needed to be normalised to remove redundancy), a view could be set up to allow old applications to still function with the new database setup.
- Cursors are also a new concept for me.
- They make perfect sense, as they prevent the full results from being needed to be loaded into the memory twice at once (once in the $a[DBMS] and one in the application that queried the $a[DBMS]), but I hadn't encountered that sort of thing before.
- </p>
- <p>
- It seems surprising and dangerous that the <code>DELETE</code> command would delete everything if no <code>WHERE</code> clause is used.
- It's also a bit inconsistent.
- For example, with a <code>SELECT</code> statement, there must be something to select.
- You can't just run <code>SELECT FROM `table`</code>.
- At the very least, it seems like deleting all rows from a table should require the asterisk, such as <code>DELETE * FROM `table`</code>.
- The same anomaly seems to be present with the <code>UPDATE</code> command, which again, doesn't match the behaviour of <code>SELECT</code>.
- </p>
- <p>
- We previously covered the union of tables, but now that we're looking at what queries need to be run to perform the union, we're being given a bit more information.
- It was stated before that when performing a union, duplicate rows will be condensed.
- That is to say, if two rows are identical, only one of the two will be returned.
- However, it seems that by using the <code>ALL</code> key word, we can make the <code>UNION</code> key word not condense the rows.
- Duplicates will show up in the returned union table if any are present in the data.
- I can see this as being useful in some cases, but I can also see the default behaviour of omitting duplicate data as also being useful for other cases.
- It's nice to have options.
- The <code>ALL</code> key word also works with the <code>INTERSECT</code> and <code>EXCEPT</code> key words, but I wasn't sure how that could be at first.
- It sounds like if the tables include duplicate rows, they'll all be returned.
- However, how can the tables possibly have duplicate rows?
- The $a[DBMS] is supposed to disallow that, seeing as the primary key for each row is required to be unique.
- Later though, it all became clear.
- Queries can contain sub-queries, just as (when programming) a function call can be used as an argument to another function call, with the return value of the inner function call being passed to the function specified by the outer function call.
- If the <code>ALL</code> key work is used for a union, the results of that union can have further operations performed on it in the form of an intersection or difference.
- </p>
- <p>
- I reworked my database plan, so the database doesn't quite look like what I've been thinking it would.
- It started with the constraints.
- I looked up the format an $a[ISBN] should be in.
- I was trying to figure out where the dashes should be in an $a[ISBN] string, so I could enforce that in the database.
- However, I saw several examples of $a[ISBN]'s with the dashes in different places.
- It looks like there's no standard placement.
- Furthermore, according to the $a[RFC], the dashes don't affect equivalency, so two $a[ISBN]s that differ only in dash placement (or lack thereof) are the same $a[ISBN].
- So what does that mean altogether?
- I can't allow differing dash placements, as we might end up with different copies of the same book information, but I can't enforce that the dashes be placed in certain spots, as I can't find any information on the standard placement of them.
- I ended up converting that field into a <code>BIGINT</code> and disallowing numbers larger than thirteen digits long.
- Next, I dealt with telephone numbers.
- There are a multitude of formats for those.
- "800-555-0199", "(800)555-0199", "(800) 555 - 0199", and "800 555 0199" are common formats here in the States.
- Personally, I prefer the international standard format, with no extra punctuation: "+18005550199".
- One of the great advantages of this format is that it specifies the country code at the beginning, which allows for the use of numbers from other regions as well; the format isn't country-specific, and is instead all-inclusive.
- People from many countries are already using this format (often with extra punctuation), but people here in the States are reluctant to admit that telephone numbers from outside the States are even valid.
- After debating back and forth a bit, I decided to convert that field into an <code>INT</code>, which only has a precision of ten digits.
- It eliminates the possibility of using international numbers, which is a major drawback in my opinion, but it also shouldn't cause too much fuss from the librarians.
- </p>
- <p>
- From there, I kept running into all sorts of problems as far as syntax.
- Many things from the book simply don't work in LibreOffice Base, such as <code>GENERATED ALWAYS</code>.
- I couldn't get <code>INT(13)</code>, <code>INTEGER(13)</code>, or the like to work either.
- I had to use a <code>BIGINT</code>, then use a constraint to keep it below fourteen digits.
- If everyone actually followed the $a[SQL] specification instead of creating their own dialects of the language, this would be so much easier to learn.
- Instead, I spent hours trying to find the pieces of information I needed to get my database set up, chasing several leads, most of which didn't pan out.
- I never did get the database set up the way I wanted it to be, but I think my implementation is "good enough".
- </p>
- <p>
- On the discussion board, someone mentioned an ability to combine more than two tables.
- I'd love to see an example of this.
- I'm hoping we cover it in a later unit.
- </p>
- <h3>Discussion post drafts</h3>
- <blockquote>
- <h4>Inner joins</h4>
- <p>
- An inner join combines data from two tables, forming a new table.
- This new table is returned, but usually not saved to disk.
- Every row in the new table has data that corresponds to data in the two tables being joined.
- There are three main types of inner joins.
- </p>
- <h5>Equi-joins</h5>
- <p>
- An equi-join is a type of inner join in which the joining factor is the equality between an attribute in one table's columns and an attribute in one of another table's columns.
- For every instance in which a match if found between these two values, a tuple will exist in the returned table.
- That means that it the column compared from one table has two rows with an identical value and the other table has three rows with that same value in the compared column, a total of six (two times three) rows will be in the returned table (along with any other matches involving different values).
- I equi-join can be performed with a query similar to the following:
- </p>
- <blockquote>
- <p>
- <code>SELECT * FROM `table0`, `table1` WHERE `table0`.`field0`=`table1`.`field1`</code>
- </p>
- </blockquote>
- <h5>Natural joins</h5>
- <p>
- The book touts natural joins as being an "improved" version of equi-joins.
- However, they seem like a <strong>*serious*</strong> downgrade, to me.
- They're potentially-ambiguous and don't always work.
- A column is automatically selected by the $a[DBMS] to join the tables, and that column is chosen based on matching column names.
- If the tables don't have any columns with matching names, the join won't work.
- The book doesn't say what will happen if the tables have multiple matching column names, but it can't be good.
- It might depend on the $a[DBMS], but perhaps the selection will sometimes join based on the wrong column or will fail to perform the join at all.
- Maybe all the columns with matching names would need matching data though, which would probably be the best option, given the alternatives.
- A natural join may be performed with a query similar to this one:
- </p>
- <blockquote>
- <p>
- <code>SELECT * FROM `table0` NATURAL JOIN `table1`</code>
- </p>
- </blockquote>
- <h5>Cross joins</h5>
- <p>
- Cross joins are the simplest type of inner join, but they also return the largest data set.
- Al rows in the first table will be combined with all rows in the second table, creating a table with as many columns as the two tables had combined, and as many rows as the number of the first table multiplied by the number of rows in the second table.
- A cross join query will look similar to this:
- </p>
- <blockquote>
- <p>
- <code>SELECT * FROM `table0`, `table1`</code>
- </p>
- </blockquote>
- <h4>Outer joins</h4>
- <p>
- In an outer join, data from two tables is combined, but some or all of the rows might lack data from one of the two tables.
- If data in one table matches multiple data sets in the other table, data may show up in the returned table multiple times.
- </p>
- <h5>Left and right outer joins</h5>
- <p>
- Left outer joins and right outer joins are nearly identical, aside from the way in which they are produced.
- One table is selected as the primary table, and all data from that table is returned.
- Any matching data from the secondary table is also returned, but any data from the secondary table that has no matches in the primary table is <strong>*not*</strong> returned.
- In a left outer join, the primary table is the first table mentioned in the query, from the <code>FROM</code> clause, while the secondary table is the remaining table.
- In a right outer join, this is reversed.
- The <strong>*secondary*</strong> table is the one in the <code>FROM</code> clause, while the <strong>*primary*</strong> table is the remaining table.
- In either case, any data returned from the primary table that has no data in the secondary table will simply have the secondary table's data filled in with nulls.
- A left inner join looks like this ...:
- </p>
- <blockquote>
- <p>
- <code>SELECT * FROM `table0` LEFT OUTER JOIN `table1` ON `table0`.`field0` = `field1`</code>
- </p>
- </blockquote>
- <p>
- ... while a right inner join looks like this:
- </p>
- <blockquote>
- <p>
- <code>SELECT * FROM `table0` RIGHT OUTER JOIN `table1` ON `table0`.`field0` = `field1`</code>
- </p>
- </blockquote>
- <p>
- The only difference between these two queries is the words <code>RIGHT</code> and <code>LEFT</code>.
- If you wanted to, you could avoid switching the word by instead swapping the order in which the tables are specified:
- </p>
- <blockquote>
- <p>
- <code>SELECT * FROM `table1` LEFT OUTER JOIN `table0` ON `table1`.`field1` = `field0`</code>
- </p>
- </blockquote>
- <h5>Full outer join</h5>
- <p>
- A full outer join returns all the data from both tables, regardless of if the fields match up.
- If fields <strong>*do*</strong> match up, the matching data is combined into a single row in the returned table, but if the data in one table has no mate in the other, the missing information will simply be filled in with nulls.
- </p>
- <blockquote>
- <p>
- <code>SELECT * FROM `table0` FULL OUTER JOIN `table1` ON `table0`.`field0` = `field1`</code>
- </p>
- </blockquote>
- </blockquote>
- <blockquote>
- <p>
- Usually, I much prefer words over diagrams, but your included images were simple and straight to the point.
- No unnecessary attributes were included, which made it much easier to see where what was going on.
- </p>
- </blockquote>
- <blockquote>
- <p>
- You mention the combining of more than two tables.
- How would one accomplish this?
- And with a left or right outer join, are the extra tuples in the centre table included or left out?
- </p>
- </blockquote>
- <blockquote>
- <p>
- You make a good point that a full outer join is sort of a combination of a left outer join and a right outer join.
- Your example tables got me thinking about something else, too.
- Your one table basically ties the other two tables together, replacing a many-to-many relationship with two one-to-many relationships.
- I wonder though if there's a query we could perform to join the <code>Student</code> and <code>Location</code> tables, using the <code>Student_Location</code> table's data in the join condition.
- It seems like there <strong>*should*</strong> be a way, and I hope we cover that in a later unit.
- </p>
- </blockquote>
- <h2 id="Unit6">Unit 6</h2>
- <p>
- I find it very interesting that the <code>DROP TABLE</code> and <code>ALTER TABLE</code> statements aren't defined in the $a[SQL] standard.
- It makes sense to separate the database designing process from database use, but it seems like there should be some way to shut off database use and go back into "design mode", if that makes any sense.
- "Design mode" wouldn't necessarily be usable on live databases, but things happen, and database structures need to be altered.
- It's incredibly restrictive to only allow the design phase to occur once.
- It's also interesting that $a[DBMS] would implement non-standard field types.
- As a person that believes in standards and interoperability, I'll need to do my best not to rely on those non-standard data types in my tables.
- That said, the ability to modify existing databases is vital.
- With no standard way to do that, I won't attempt to avoid the non-standard way.
- </p>
- <p>
- Much of what we covered this week was stuff that would've been incredibly helpful in completing last week's assignment.
- I found myself struggling last week, and I had to look up a lot of this stuff already.
- Some of it, I didn't even find the proper solution, and had to put together something a bit messy.
- Had we covered this week's material last week, I feel like my assignment submission would've been easier to complete and would've come out better.
- </p>
- <p>
- It's been another hectic week for me offline, so it was nice having a week of mostly review.
- </p>
- <p>
- In the grading, I found two of the students made the telephone number field for borrowers unable to be set to <code>null</code>.
- So what do they want borrowers without telephone service to do?
- Are these people unable to register in the system?
- Are they unable to borrow books?
- The telephone system is outdated and broken in many ways.
- It's still widespread, which is kind of sad, but some people have managed to move beyond it.
- Some people, for example, instead use $a[SIP] to do their voice chatting.
- Some people, such as myself, don't even need to move beyond telephone service because we're not really into voice chatting at all.
- Not everyone has telephone service, not everyone needs telephone service, and reasonable companies do not make the telephone number field mandatory.
- In practice, I've found that almost every system run by companies I come in contact with don't demand telephone numbers.
- The companies act like it's a required field, but when you tell them you don't have telephone service, they are in fact able to leave that field blank.
- </p>
- <p>
- One of the students submitted an entity relationship diagram instead of the required screenshot of LibreOffice Base's entity relationship diagram based on the actual tables.
- The queries they ran to create the database built an incomplete and broken database, but their entity relationship diagram was about perfect.
- It looked like they had the theory and structure down pretty well, but didn't know how to actually implement what they were trying to.
- </p>
- <p>
- I think the main thing I got from interactions this week was a hint to try using phpMyAdmin instead of LibreOffice Base.
- Another student was using phpMyAdmin and posted screenshots, giving me the idea.
- I've been finding the LibreOffice Base interface to be mildly obtuse.
- Options aren't where I expect them to be, for one thing, and it's difficult to find what I'm after.
- LibreOffice Base also seems to be catered toward graphical use, and that's not what we're using in class.
- </p>
- <h3>Discussion post drafts</h3>
- <blockquote>
- <p>
- There's one obvious solution:
- </p>
- <blockquote>
- <p>
- <code>update books set Cost = 0 where Pub_Date < "10/01/2009"</code>
- </p>
- </blockquote>
- <p>
- Other solutions are possible of course, but many of them would be convoluted an inefficient.
- I'm interested to see if any students come up with other good solutions though, or if everyone pretty much goes with variants of this solution.
- I myself can't come up with any other viable alternatives, but that obviously doesn't mean that equally-good or even better solutions aren't available.
- </p>
- <p>
- It's worth noting that in the phrasing of the scenario, we're trying to alter rows with publication dates <strong>*before*</strong> the given date.
- The given date cannot be <strong>*before*</strong> itself, so we're looking for rows with a date that is <strong>*less than*</strong> the given date, <strong>*not*</strong> less than <strong>*or equal to*</strong> it.
- The <code>Cost</code> field is a <code>decimal(10,2)</code>, so while there are dollar signs in the displayed table, we shouldn't let that fool us into thinking that the price is in string format or that we should include a dollar sign.
- Dollar signs are not allowed, and we should just set the field to a value of <code>0</code>.
- </p>
- </blockquote>
- <blockquote>
- <p>
- I developed a query based on Professor Washington's table, I didn't run it on a live database.
- In her example, the dates were in that format, so I assumed that in whatever database she was using, that was the format for dates.
- I much prefer the big endian notation that you used, and that's the date format I always use when given a choice.
- </p>
- </blockquote>
- <blockquote>
- <p>
- Good catch!
- I didn't even notice that one of the book titles was too long.
- Add that to the dollar signs in the cost fields, and the table's data really doesn't fit the table definition.
- </p>
- </blockquote>
- <blockquote>
- <p>
- That's a great idea, working in phpMyAdmin!
- I hadn't even considered using that instead of the recommended LibreOffice Base.
- I'm more familiar with the phpMyAdmin interface though (I have zero experience with LibreOffice Base outside this course), and it's even available in the Debian repositories, making it easy to install on my system.
- Thanks for the great idea!
- </p>
- </blockquote>
- <blockquote>
- <p>
- You make a great point that the date format is specific to LibreOffice Base.
- The format used by LibreOffice Base is the international standard date format, however, this isn't necessarily the format used by all $a[DBMS]s.
- The problem is that the <code>date</code> type is non-standard, and isn't a part of the $a[SQL] language.
- Any database software that implements this data type has no standard to base it off of, so it has a variety of implementations.
- </p>
- </blockquote>
- <h2 id="Unit7">Unit 7</h2>
- <p>
- It's nice that $a[SQL] treats different types of white space as equivalent, like languages such as Java and $a[PHP].
- This allows for formatting that makes reading by humans easier. In simple cases such as bash, treating new line characters as command ends is mildly inconvenient.
- In more extreme cases such as Python, all ability to format in a user-readable way is hopelessly lost.
- $a[SQL] doesn't suffer from that problem.
- </p>
- <p>
- It's interesting that the inequality operator in standard $a[SQL] is <code><></code>.
- In most computerised languages I've seen, <code>!=</code> is used instead, though I've also seen <code>~=</code> used in Lua.
- Unfortunately though, as we've previously discussed, the $a[SQL] standard isn't really followed by anyone.
- This results in different $a[DBMS]s using differing inequality operators.
- As someone that highly values standards and interoperability, I find this very disappointing.
- </p>
- <p>
- The <code>BETWEEN</code> key word is misleading.
- It's <strong>*inclusive*</strong>, which means that it doesn't actually find the values strictly between the lower and upper bounds.
- Instead, it captures values <strong>*from*</strong> the lower bound <strong>*to*</strong> the upper bound.
- There's a huge difference between these concepts, and in my opinion, this is an oversight in the $a[SQL] language specification.
- Either the behaviour should be corrected or the key words changed to match the current behaviour.
- It's said that the <code>BETWEEN</code> key word is clearer to the reader of a query what is intended than using comparison operators.
- However, if the reader understands basic English, this is untrue, as the <code>BETWEEN</code> key word doesn't have the intuitive behaviour.
- </p>
- <p>
- On first look, it seems strange that $a[SQL] requires fields used in the <code>ORDER BY</code> clause to be within the selected data.
- However, if you think about it, it's not strange at all.
- It's vital that these fields be included.
- The reason for this is simple: complex queries.
- When <code>SELECT</code>ing data, you don't always select from just a single table.
- In a single-table case, it's pretty clear what unlisted fields belong to which returned tuples.
- However, the that becomes an impossible task when taking into consideration that any number of tables may be used to produce the returned table.
- What should happen if I try to join some data from each of two tables, omitting some fields and omitting duplicate entries, then try to order by a field not included?
- One of the returned tuples could have multiple field values "associated" with it in the unreturned data, and ordering by that field would be ambiguous.
- Come to think of it, you don't even need a second table for this kind of issue to pop up.
- All you need is to run a <code>SELECT</code> query, not select all the columns, select only columns that don't enforce uniqueness, and omit duplicates.
- A language should be consistent.
- If this use case isn't workable, it makes sense not to allow other instances of the more general use case either.
- </p>
- <p>
- It's stupid though that the columns are indexed from one.
- We are computer scientists.
- We understand that zero is not only a number, but the base number.
- Indexing from one seems like it was done to make things "easier", but really, it seems insulting.
- It seems like whoever wrote that part of the specification thought we were too dim to figure out the basic, standard, widely-used paradigm of indexing from zero when doing <strong>*anything*</strong> computer-related.
- (It's worth noting though that I index from zero even outside the realm of computers, so maybe I just have a bias against indexing from one.)
- </p>
- <p>
- I was very surprised to learn that outer joins aren't standard in $a[SQL].
- Many $a[SQL] dialects implement them, but they're not actually available in the real $a[SQL] language.
- </p>
- <p>
- Upon reading about the concept of joining a table to itself, I immediately wondered about names and namespaces.
- How could column names be unambiguously referred to when the tables joined had the same column names (names) and table names (name spaces)?
- The book got right into that issue though, so I didn't have to wait to see how that would work out.
- The use of aliases is interesting.
- A fellow student sent me a complex query via email a while back that used aliases, but at the time, I thought that aliases only served to create needless complexity.
- Shortening a table name for use in a single query seems just lazy; like you're trying to avoid typing a few extra characters at the cost of clearly-readable queries.
- However, in the case of joining a table to itself, aliases become an invaluable tool for disambiguation.
- </p>
- <p>
- Three of the restrictions on the <code>UNION</code> operator seem very strange to me.
- First, <code>UNION</code> clauses can't be used in subqueries.
- Why not?
- Second, aggregate functions can't be used to generate the tables that the <code>UNION</code> operator operates on.
- Again, what purpose does this limitation serve?
- Lastly, the tables joined can't be created with using the <code>ORDER BY</code> clause.
- Um.
- What?
- What does ordering the tables have to do with preventing their concatenation with other tables?
- That seems like straight-up foolishness.
- That said, I think there's probably a good reason for all three of these limitations, I just don't know what that reason is.
- </p>
- <p>
- The book mentions that while you can't order the tables to be concatenated, you <strong>*can*</strong> order the results of the concatenation.
- I'd like to point out that these are <strong>*not*</strong> the same thing though and do <strong>*not*</strong> serve the same purposes.
- If I want to order the tables individually but not mix their results in the combined table, I'd need to use the <code>ORDER BY</code> clause with the individual tables (which is disallowed) instead of with the table produced by the <code>UNION</code> operator.
- </p>
- <p>
- The book says that constraints are thought to be negative, but database constraints do a positive job.
- I think this is entirely the wrong way to look at it.
- Database constraints keep the database from reaching certain states, states of disorder and non-usefulness.
- Because they are <strong>*keeping something from happening*</strong>, they are performing a <strong>*negative*</strong> job.
- They <strong>*subtract*</strong> from the range of possibilities.
- However, <strong>*negative*</strong> is not the same thing as <strong>*bad*</strong> and <strong>*positive*</strong> is not the same thing as <strong>*good*</strong>.
- If you have a positive result from an $a[HIV] test that you take, is that a good thing just because it's positive?
- No.
- No it is not.
- That's an extreme example, but other examples are easy to find as well.
- A better way to think about it is to decouple the concept of positivity and negativity from the concept of good and bad.
- </p>
- <p>
- Triggers seem useful in some contexts, but like the book says, they add complexity and take away control.
- On the surface, I'm not sure how to feel about them.
- That said, they're also non-standard, which tells me <strong>*exactly*</strong> how to feel about them.
- I'm highly in favour of standards and interoperability, so unless this feature is ever added to the real $a[SQL] specification, it's something that I'd avoid unless a boss/client specifically asked me to implement such a thing in a database they had me construct for them.
- </p>
- <p>
- The book said that a view's query could not make use of the <code>UNION</code> operator.
- If you think about it, it makes sense.
- The <code>UNION</code> operator (for unknown reasons) cannot be used in subqueries.
- The query that a view represents is <strong>*always*</strong> a subquery.
- Therefore, the <code>UNION</code> operator can never be used in a view's query.
- What I find surprising though that update queries can be preformed on views, even if only on views that meet certain requirements.
- If the view is really a named subquery, does that mean that updates can be performed on other subqueries if they meet the same qualifications?
- My guess is a strong "no".
- The <code>WITH CHECK OPTION</code> seems pretty handy for making the views seem more coherent.
- It's not strictly necessary to use it, but in most cases, I think it'd be a good idea to reduce confusion.
- </p>
- <h3>Discussion post drafts</h3>
- <blockquote>
- <p>
- The <code>IN</code> key word tests for inclusion in a specified set.
- A set of valid values are specified as part of the query, and a value from a column is compared against that set.
- The value could be one that's included in the row, or it could be something calculated, such as the sum of the value of two or more attributes.
- This same effect could be achieved using a more-complex set of comparison operators and boolean logic, but it'd be harder to read.
- In this case, each value would need to be checked for individually, and these comparisons would need to be chained together with the <code>OR</code> operator.
- </p>
- <p>
- The <code>LIKE</code> key word is used to match values in which only part of the value is known.
- The underscore character (<code>_</code>) matches any single character, while the percent character (<code>%</code>) matches any string of characters.
- In this way, the underscore functions the same way the full stop (<code>.</code>) does in bash or $a[regexp], while the the percent character functions the way the asterisk (<code>*</code>) does in bash or a full stop combined with an asterisk (<code>.*</code>) does in $a[regexp].
- In order to escape literal use of either of these two characters, the <code>ESCAPE</code> key word can be used to specify an escape character, which can be placed preceding any literal use of the character.
- Any character may be chosen as the escape character, allowing for some flexibility.
- Instead of needing to escape the escape character sometimes, you can just choose an escape character that isn't otherwise used in your search string.
- It's also worth noting that most $a[DBMS]s don't actually implement standard $a[SQL].
- Instead, they each make up their own, non-standard dialect of $a[SQL].
- One result of this is that not all $a[DBMS]s support the <code>ESCAPE</code> key word.
- </p>
- <p>
- Both the <code>IN</code> and <code>LIKE</code> key words can be preceded by the <code>NOT</code> key word to reverse their meanings.
- For example, <code>NOT IN</code> would find rows with values that aren't in the given set, while <code>NOT LIKE</code> would find rows with values that don't match the pattern.
- </p>
- </blockquote>
- <blockquote>
- <p>
- I don't understand your patterns.
- One pattern, <code>'_r%'</code>, you said finds an <code>r</code> in the first position, but it actually finds it in the second.
- In your <code>'a_%_%'</code> pattern, your desired behaviour is achieved, but a simpler <code>'a__%'</code> (with two consecutive underscores) would achieve the same effect while being easier to read and easier on the pattern-checker's finite state machine.
- </p>
- </blockquote>
- <blockquote>
- <p>
- I like your idea of using the results of a <code>SELECT</code> statement as the set for an <code>IN</code> clause.
- It seems like a complex way to do things at first, but it's also more dynamic.
- For certain types of database structures, especially when hard-coded queries are put in applications, this could offer some very useful advantages.
- The hard-coded query can be set up to use a <code>SELECT</code> statement in an <code>IN</code> clause like this, and as the database is updated, the results returned by this query when it gets run are also updated.
- </p>
- </blockquote>
- <blockquote>
- <p>
- Your <code>'a_%_%'</code> pattern seems a bit redundant.
- Removing either of the percent signs (preferably the first one for efficiency in the pattern interpreter's internal finite state machine) will result in the same matches.
- The <code>%</code> character doesn't modify the <code>_</code> character; it operates independently of it.
- </p>
- </blockquote>
- <h2 id="Unit8">Unit 8</h2>
- <p>
- Embedded $a[SQL] statements aren't contained in strings.
- They're included with the rest of the code, barely encapsulated at all.
- Some statement is used (for example, <code>EXEC SQL</code>) to instruct the compiler to switch parsing languages, then after the query is run, the compiler is instructed to switch back to interpreting in the host language with some token such as a semicolon.
- To me, switching languages in the middle of a file like this seems incredibly messy.
- Java does a bit better of a job encapsulating the $a[SQL] statements, using <code>#sql {</code> to enter $a[SQL]-parsing mode and <code>};</code> to return to Java-parsing mode, but it still involves switching syntaxes back and forth mid-file.
- It's unclean, and I don't really see a valid reason for this way of doing things.
- </p>
- <p>
- From the looks of it, the precomplier replaces the embedded $a[SQL] statements with the proper $a[API] calls so the host language never even sees the embedded $a[SQL] statements.
- That means that there <strong>*has*</strong> to be a better way to do this than to embed the $a[SQL] statements in the middle of a file of another language.
- You could simply make the correct method calls yourself, but that has disadvantages too.
- The precompiler does things for you such as variable type checking.
- If you simply call the required methods yourself, your code will be cleaner, but at the cost of this added check.
- The check could save you a lot of development time and effort if you make a mistake.
- The precompiler also sets up an access plan for interacting with the database.
- This plan makes sure that queries are optimised and very quick.
- From the sounds of it though, it might store the access plan in the database itself, which would be a very bad thing.
- It would mean that your code, when run on another machine or with another database, wouldn't have the access plan it needs.
- </p>
- <p>
- The <code>SQLCODE</code> variable seems very useful at first, but is actually very limited in usefulness.
- Checking for a non-zero value can determine if an executed query was successful, and that can be vital.
- However, the non-zero code itself provides no useful information, as the codes are system-specific or hardware-specific.
- You can't simple read that the returned error code is, say, <code>3</code>, and expect to have any idea what that means.
- Even if you figure out what it means for your own hardware and system, you can't write any code that handles that error code in any specific way and still have your code be platform-independent.
- The <code>SQLSTATE</code> variable is much more practical.
- It's contents are standardised, so you can be more sure that a known value represents a known corresponding state.
- </p>
- <p>
- The examples in the book confuse me.
- In the example of a "static" $a[SQL] query, the <code>WHERE</code> clause used a host variable to decide what value a given field had to have a matching value of.
- However, in the example of a "dynamic" query, the example instead uses the <code>=?</code> notation.
- This seems to be the only difference.
- So, what is the advantage of each method?
- If the member variables can be used in the <code>WHERE</code> clause, it seems unnecessary to instead use the <code>=?</code> syntax for the same purpose.
- Nothing new is accomplishable.
- </p>
- <p>
- The X/Open Call Level Interface looks like a great thing.
- Applications can be written to work with any $a[DBMS] that implements it.
- However, as usual, Microsoft refused to follow the standard, instead creating their own thing, called Open Database Connectivity.
- Open Database Connectivity was originally only available to Windows, but is now available on other platforms as well.
- There was no good reason to create a second, nonstandard $a[API] though, as far as I can tell.
- I could try to give them the benefit of the doubt, but let's be honest here: Microsoft is a screwy company and when they do these things, they're usually either being stupid (for example, reinventing old wheels just to say they have their own version), being malicious (for example, trying to create vendor lock-in and make sure the real standard never takes hold), or being some combination of the two.
- </p>
- <p>
- pureQuery looks much more like what I'm familiar with and much more like what I'd consider to be clean.
- Instead of including $a[SQL] within other-language source files and precompiling those files, pureQuery just takes $a[SQL] queries as string arguments and runs those queries.
- Something may later may change my mind, but for now, I think this type of setup is the <strong>*only*</strong> sane way to use $a[SQL] within an application.
- Of course, there are other sane ways to access data from a $a[DBMS] database, such as the $a[API] provided by the X/Open Call Level Interface, but the X/Open Call Level Interface doesn't actually involve $a[SQL].
- Therefore, it's sane, but it's not a "sane way to use $a[SQL] within an application".
- </p>
- <h3>Epilogue (Unit 9)</h3>
- <p>
- Some of the review quiz and final exam questions needed further explanation that wasn't provided.
- Having the review quiz available helped though; it gave me a chance to guess and check.
- For example, one question asked if foreign keys are necessary.
- No context for the question was given.
- Strictly speaking, no, foreign keys aren't necessary for all use cases, but they're incredibly helpful in a lot of cases.
- I tried answering that they're not necessary though, and that got marked as incorrect.
- The quiz wanted me to claim they're necessary, even without the context needed to determine that.
- I forget what other questions like that one needed more information, but it was nice to be able to go back and review what the quiz was looking for.
- </p>
- <h3>Discussion post drafts</h3>
- <blockquote>
- <p>
- "Statically"-embedded $a[SQL] queries aren't necessarily completely static.
- The values entered into the database are allowed to be dynamic without the query being considered a dynamic one, which is quite misleading.
- The dynamically-generated values to insert into the database make "static" $a[SQL] queries plenty dynamic.
- "Static" queries make use of special variables, called host variables, to pass information to and from the host language.
- In the embedded $a[SQL] syntax, host variables begin with a colon, which followed by a label (variable name).
- These "static" queries can be associated by a precompiler with an access plan, and that access plan can then be stored in the database for faster querying later.
- However, if the access plan is stored in the database, it sounds like the code won't have access to the access plan when run on a machine other than the one the code was compiled on.
- This has serious disadvantages, and makes the access plans effectively useless for any software you plan to distribute, unless you have your users compile the code on their own machines.
- Storing the access plan in the database may speed querying at first, but as the database grows and changes, these plans can become outdated.
- This results in inefficient querying, and it gets worse the longer the application and database are in use.
- </p>
- <p>
- "Dynamic" queries are much more flexible.
- The access plans aren't generated at precompile time, so they aren't stored in the database at that time either.
- Instead, the access plan have to be generated at run time.
- The examples in the book confuse me, so I'm not sure exactly how writing "dynamic" is any different, aside from the syntax.
- However, they seem to provide better portability and long-tern efficiency.
- As the access plans are generated at runtime, short-term efficiency is diminished.
- The application can't simply look up the access plan in the database, it must generate it from scratch every time.
- However, long-term efficiency is improved.
- As the databased grows and changes, a stored access plan can become outdated and inefficient.
- Because "dynamic" queries involve recreating the access plan every time, the access plan is never outdated and always efficient.
- </p>
- <p>
- In both the examples in the book, the missing part of the query was the value to to compare in the <code>WHERE</code> clause.
- This makes it seem like both types of query can be used in the same situations.
- The book doesn't really explain where each should be used, unless I missed something.
- Additionally, "dynamic" queries seem to still have a "static" component to them, meaning that the precompiler must still be used for code that uses "dynamic" queries.
- In general, "dynamic" queries seem much more flexible, but that's only until you take into account that because of this "static" component, the disadvantages of "static" queries aren't actually avoided when using "dynamic" queries.
- In theory though, "static" queries would be better for an application used by a business internally.
- It would provide better efficiency, and could be recompiled every so often to update the access plans as the database changes.
- "Dynamic" queries would be better for anything with multiple copies.
- For example, software that gets distributed would do better with "dynamic" queries.
- Users aren't going to want to compile it at all usually, and if they do compile, they likely won't want to recompile later to update the access plans.
- </p>
- </blockquote>
- <blockquote>
- <p>
- You make a good point that "static" $a[SQL] statements are only parsed, validated, and optimised once.
- Parsing and validating only once offers great performance benefits.
- It's work that the computer only has to perform once.
- However, <strong>*optimising*</strong> the query only once is a <strong>*disadvantage*</strong>.
- As the database grows and changes, old access plans become outdated and inefficient.
- "Static" queries become slower over time, while "dynamic" queries remain relatively fast throughout the life of the application and throughout the life of the database.
- </p>
- </blockquote>
- <blockquote>
- <p>
- Like "static" $a[SQL], "dynamic" $a[SQL] is a type of embedded $a[SQL].
- "Embedded $a[SQL]" isn't a synonym for "'static' $a[SQL]", but a superset of it.
- In both "static" $a[SQL] and "dynamic" $a[SQL], partial $a[SQL] statements are embedded within files of the host language.
- The main difference is just how those embedded statements are handled.
- "Static" statements are handled entirely at precompile time.
- "Dynamic" statements are handled partly at precompile time, but mostly at run time.
- </p>
- </blockquote>
- <blockquote>
- <p>
- In what way are "dynamic" queries more customisable than "static" queries?
- From this week's reading assignment, it did seem that that was the major advantage of "dynamic" queries.
- However, I didn't understand exactly how they were more flexible.
- In the examples given by the textbook, it looked like the same parts of the query could be altered in both "static" and "dynamic" queries, but that different syntax was used to accomplish the same task.
- I'm sure I had to have missed something.
- </p>
- </blockquote>
- END
- );
|