In the third part of this series of handy little PHP functions I’ll be describing a darn little function that only has an impact on the aesthetics of a site.
Of summaries and such
Imagine you have a site that has a sidebar containing auto-generated summaries of (news)items, as seen on Jonathan Snook’s blog for example
(I could have referred to some other sites, yet Jonathan’s blog is perfect to highlight the why of this function)
If you take a closer look at the summaries you’ll see that they all are cut at about the 200th character:
I had a few things I wanted to mention and separate posts about each didn’t seem warranted so here it is: 3 days left! The SXSW 2007 Interactive Pass Contest was off to a strong start with a bunch …
Above that the summary is a flat version of the actual post: in the full news/blogpost of the item quoted above you’ll see that 3 days left! is a heading for example.
Freebie: creating the summary
Before getting to reworkSummary(), a function to create a summary is needed. This is actually plain simple:
- Enforce the string to be HTML (thus the opposite of htmlentities())
- Strip out the HTML (that’s why the string had to be enforced to be HTML)
- Cut at the 200th character
The order of these steps are important, since otherwise one would be able to cut right in between an xhtml tag…
The PHP code for this is even more easy.
// All your code are belong to bram.us
function makeSummary($text, $length, $hellip = "…") {
// enforce html text (maybe it was encoded somewhere)
$htmlText = html_entity_decode($text);
// strip out the tags
$flatText = strip_tags($htmlText);
// text is not longer than length, return flatText
if (strlen($flatText) <= $length) {
return $flatText;
// text is longer than length, create the summary and add hellips
} else {
$summary = substr($flatText, 0, $length);
return $summary.$hellip;
}
}
With the addition of the horizontal ellipsis at the end (to indicate that there's more to read) we're done, right?
And now ... the carbomb (Dr. Dre)
Wrong! In the example quote way at the top, you'll see that the summary was cut just after a word, yet some other summaries are cut right in the middle or a word, for example:
This seems to bite me in the ass more often than not but any time you add a new model or adjust your associations, be sure to delete the cached ones from the /app/tmp/ folder. I'll get inexplicable er...
Not that nice indeed.
Now I must say that I myself hadn't noticed this, it was only until a client of ours made me aware of it whilst developing their site that I saw it. And boy I must say, she sure was right: it does look pretty nasty when a word is cut in half and the horizontal ellipsis is placed right after that. *Yoink*
Solution for this problem actually is "as easy as killing babies with axes": just cut off the summary on the last space and you're done (there would a problem if no space occurs, yet that won't occur on a regular news-/blogpost). So next to the makeSummary() function we'll create a reworkSummary() function to take the appropriate actions.
The resulting PHP code for this is:
// All your code are belong to bram.us
function reworkSummary($summary) {
// find last space
$pos = strrpos($summary, " ");
// a space was found: return substring from 0 till found position
if ($pos !== false) {
return substr($summary, 0, $pos);
// no space was found: return the created summary
} else {
return $summary;
}
}
Now all you have to do is adjust makeSummary() so that it calls the reworkSummary() function. Please note that the default horizontal ellipsis now is preceded with a space, to make it look even nicer:
// All your code are belong to bram.us
function makeSummary($text, $length, $hellip = " …") {
// enforce html text (maybe it was encoded somewhere)
$htmlText = html_entity_decode($text);
// strip out the tags
$flatText = strip_tags($htmlText);
// text is not longer than length, return flatText
if (strlen($flatText) <= $length) {
return $flatText;
// text is longer than length, create the summary and add hellips
} else {
$summary = substr($flatText, 0, $length);
return reworkSummary($summary).$hellip;
}
}
Code example:
The following code ...
echo makeSummary("Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book.", 200);
echo makeSummary("Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book.", 200, " (continued)");
echo makeSummary("Lorem Ipsum is simply dummy text of the printing and typesetting industry.", 200, " (continued)");
... will output:
Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type …
Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type (continued)
Lorem Ipsum is simply dummy text of the printing and typesetting industry.
Happy summarizing!
B!
Thanks for sharing. I once tried the same with XSLT. It was for a WAP site with limited text blocks. I cutted right after a comma or dot, depending on how long the phrases where. Gosh, it took me several days 🙂 XSLT is just weird…
Your function seemed to be a bit bloated to me. This is working solution. Correct me if I’m wrong 🙂
function makeSummary($text, $length, $hellip = ' …')
{
// strippen van tags
$text = strip_tags($text);
// afkappen die handel
$text = wordwrap($text, $length, '');
// splitten (we hebben enkel 1ste element nodig!)
$aText = explode('', $text);
return $aText[0] . $hellip;
}
echo makeSummary("Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book.", 100);
I suppose the line feed characters (\n – “backslash n”) got stripped while posting … 😉
In reply I must say that wordwrap was an unknown function to me … thanks for bringing it up and neat way to splitting the $text parameter.
However your code needs some adjustments I already built in mine:
– the enforcing of html text by calling html_entity_decode() first.
– check to see $text is longer than $length before wordwrapping and exploding and adding the $hellip.
And oh, the functionality in the post was split into two, as the latter one was an extension to an already existing makeSummary() function … pretty sure I could knock off a few lines :p
I didn’t include html_entity_decode on purpose, since the “strip_tags” method automagically removes the remaining html tags. No harm done using it though, you never can trust input.
I was thinking of adding a check to see if the length is longer, cause it seems a bit odd to always, no matter how long it is, adding the $hellip 🙂
Davy, the html_entity_decode() is needed imo as the strip_tags() can not strip tags from data that once (viz. when storing in database) was processed by htmlentities().
Sidenote: long live PHP’s naming consistency! >> why is it html_entity_decode() and not htmlentitydecode() if it’s htmlentities()? And why is it htmlentities() and not htmlentityencode() or html_entity_encode()?
True, I didn’t think about the text being encoded while fetching it from the database. The problem you do have with this method is that you special chars like eg “é” will be converted to an actual é and that’s not valid ofcourse 🙂 This is considering html_entity_decode converts all entities and not only the characters.
Then we should run htmlspecialchars_decode() instead of html_entity_decode(); Or just run (again) an htmlentities() after the summary has been made …
The former would be better of course as it calls less functions, although it would give troubles when a < or > is contained within the text :-S
I’d also throw in an optional forward/reverse looking sniffer for the end of a sentence. Finishing a precis with “sentence end. The …”looks a bit silly.